1 Introduction

One of the most important relationships between holographic gravity and entanglement is the Ryu–Takayanagi (RT) formula, which states that entanglement entropy of a region in the boundary conformal field theory (CFT) is dual to a geometric extremization problem in the bulk [35, 36]. Specifically, the formula states that the entropy of a spatial region A on the boundary CFT is given by

$$\begin{aligned} S(A) = \frac{1}{4G_{\mathrm{N}}}{{\,\mathrm{area}\,}}(m(A)), \end{aligned}$$
(1.1)

where m(A) is a minimal hypersurface in the bulk homologous to A. This elegant formula is essentially an anti-de Sitter (AdS) cousin of the black hole entropy formula, but more importantly, it is expected to yield new insights toward how entanglement and quantum gravity are connected [29, 42].

Despite the fact that the RT formula has been a subject of intense research for over a decade, there are still many facets of it that are only now being discovered. Indeed, only recently was it demonstrated that the geometric extremization problem underlying the RT formula can alternatively be interpreted as a flow extremization problem [16, 22]. By utilizing the Riemannian version of the max flow-min cut theorem, it was shown that the maximum flux out of a boundary region A, optimized over all divergenceless bounded vector fields in the bulk, is precisely the area of m(A). Because this interpretation of the RT formula suggests that the vector field captures the maximum information flow out of region A, the flow lines in the vector field became known as “bit threads.” These bit threads are a tangible geometric manifestation of the entanglement between A and its complement.

Although bit threads paint an attractive picture that appears to capture more intuitively the information-theoretic meaning behind holographic entanglement entropy, there is still much not understood about them. They were used to provide alternative proofs of subadditivity and strong subadditivity in [16], but the proof of the monogamy of mutual information (MMI) remained elusive. MMI is an inequality which, unlike subadditivity and strong subadditivity, does not hold for general quantum states, but is obeyed for holographic systems in the semiclassical or large N limit. It is given by

$$\begin{aligned} \begin{aligned} -\,I_3(A:B:C)&:= S(AB) + S(AC) + S(BC) \\&\quad -\,S(A) -S(B) - S(C) - S(ABC) \ge 0. \end{aligned} \end{aligned}$$
(1.2)

The quantity \(-\,I_3\) is known as the (negative) tripartite information, and property (1.2) was proven in [18, 20] using minimal surfaces.Footnote 1 While MMI is a general fact about holographic states, the reason for this from a more fundamental viewpoint is not clear. Presumably, such states take a special form which guarantees MMI (cf. [10, 31]). What is this form? It was suggested in [16] that understanding MMI from the viewpoint of bit threads may shed some light on this question.

In this paper we will take up these challenges. First, we will provide a proof of MMI based on bit threads. Specifically, we show that, given a decomposition of the boundary into regions, there exists a thread configuration that simultaneously maximizes the number of threads connecting each region to its complement. MMI follows essentially directly from this statement.Footnote 2 This theorem is the continuum analogue of a well-known result in the theory of multicommodity flows on networks. However, the standard network proof is discrete and combinatorial in nature and is not straightforwardly adapted to the continuum. Therefore, we develop a new method of proof based on strong duality of convex programs. Convex optimization proofs have the advantage that they work in essentially the same way on graphs and Riemannian manifolds, whereas the graph proofs standard in the literature often rely on integer edge capacities, combinatorics, and other discrete features, and do not readily translate over to the continuous case.Footnote 3 The convex optimization methods offer a unified point of view for both the graph and Riemannian geometry settings, and are a stand-alone mathematical result. As far as we know, these are the first results on multicommodity flows on Riemannian manifolds.

Second, we use the thread-based proof of MMI to motivate a particular entanglement structure for holographic states, which involves pairwise-entangled states together with a four-party state with perfect-tensor entanglement (cf. [31]). MMI is manifest in this ansatz, so if it is correct then it explains why holographic states obey MMI.

It has also been proven that holographic entropic inequalities exist for more than four boundary regions [5]. For example, MMI is part of a family of holographic entropic inequalities with dihedral symmetry in the boundary regions. These dihedral inequalities exist for any odd number of boundary regions, and for five regions other holographic inequalities are also known. However, the general structure of holographic inequalities for more than five boundary regions is currently not known. It would be interesting to try to understand these inequalities from the viewpoint of bit threads. In this paper, we make a tentative suggestion for the general structure of holographic states in terms of the extremal rays of the so-called holographic entropy cone.

We organize the paper in the following manner. In Sect. 2, we give the necessary background on holographic entanglement entropy, flows, bit threads, MMI, and related notions. In Sect. 3, we state the main theorem in this paper concerning the existence of a maximizing thread configuration on multiple regions and show that MMI follows from it. In Sect. 4, we use bit threads and the proof of MMI to motivate the conjecture mentioned above concerning the structure of holographic states. In Sect. 5, we prove our main theorem as well as a useful generalization of it. Section 6 revisits our continuum results in the graph theoretic setting, demonstrating how analogous arguments can be developed there. In Sect. 7 we discuss open issues.

2 Background

2.1 Ryu–Takayanagi formula and bit threads

We begin with some basic concepts and definitions concerning holographic entanglement entropies. In this paper, we work in the regime of validity of the Ryu–Takayanagi formula, namely a conformal field theory dual to Einstein gravity in a state represented by a classical spacetime with a time-reflection symmetry. The Cauchy slice invariant under the time reflection is a Riemannian manifold that we will call \(\mathcal {M}\). We assume that a cutoff has been introduced “near” the conformal boundary so that \(\mathcal {M}\) is a compact manifold with boundary. Its boundary \(\partial \mathcal {M}\) is the space where the field theory lives.

It is sometimes convenient to let the bulk be bounded also on black hole horizons, thereby representing a thermal mixed state of the field theory. However, for definiteness in this paper we will consider only pure states of the field theory, and correspondingly for us \(\partial \mathcal {M}\) will not include any horizons.Footnote 4 This assumption is without loss of generality, since it is always possible to purify a thermal state by passing to the thermofield double, which is represented holographically by a two-sided black hole.

Let A be a region of \(\partial \mathcal {M}\). The Ryu–Takayanagi formula [35, 36] then gives its entropy A as the area of the minimal surface in \(\mathcal {M}\) homologous to A (relative to \(\partial A\)):

$$\begin{aligned} S(A) = \frac{1}{4G_{\mathrm{N}}}\min _{m\sim A}{{\,\mathrm{area}\,}}(m). \end{aligned}$$
(2.1)

(We could choose to work in units where \(4G_{\mathrm{N}}=1\), and this would simplify certain formulas, but it will be useful to maintain a clear distinction between the microscopic Planck scale \(G_\mathrm{N}^{1/(d-1)}\) and the macroscopic scale of \(\mathcal {M}\), defined for example by its curvatures.) We will denote the minimal surface by m(A)Footnote 5 and the corresponding homology region, whose boundary is \(A\cup m(A)\), by r(A). The homology region is sometimes called the “entanglement wedge”, although strictly speaking the entanglement wedge is the causal domain of the homology region.

2.1.1 Flows

The notion of bit threads was first explored in [16]. To explain them, we first define a flow, which is a vector field v on \(\mathcal {M}\) that is divergenceless and has norm bounded everywhere by \(1/4G_{\mathrm{N}}\):Footnote 6

$$\begin{aligned} \nabla \cdot v = 0, \quad |v| \le \frac{1}{4G_{\mathrm{N}}}. \end{aligned}$$
(2.2)

For simplicity we denote the flux of a flow v through a boundary region A by \(\int _A v\):

$$\begin{aligned} \int _A v:=\int _A\sqrt{h}\,\hat{n}\cdot v, \end{aligned}$$
(2.3)

where h is the determinant of the induced metric on A and \(\hat{n}\) is the (inward-pointing) unit normal vector. The flow v is called a max flow on A if the flux of v through A is maximal among all flows. We can then write the entropy of A as the flux through A of a max flow:

$$\begin{aligned} S(A) = \max _{v\text { flow}}\int _A v. \end{aligned}$$
(2.4)

The equivalence of (2.4) to the RT formula (2.1) is guaranteed by the Riemannian version of the max flow-min cut theorem [13, 22, 33, 38, 39]:

$$\begin{aligned} \max _{v\text { flow}}\int _A v = \frac{1}{4G_{\mathrm{N}}}\min _{m\sim A}{{\,\mathrm{area}\,}}(m). \end{aligned}$$
(2.5)

The theorem can be understood heuristically as follows: by its divergencelessness, v has the same flux through every surface homologous to A, and by the norm bound this flux is bounded above by its area. The strongest bound is given by the minimal surface, which thus acts as the bottleneck limiting the flow. The fact that this bound is tight is proven by writing the left- and right-hand sides of (2.5) in terms of convex programs and invoking strong duality to equate their solutions. (See [22] for an exposition of the proof.) While the minimal surface m(A) is typically unique, the maximizing flow is typically highly non-unique; on the minimal surface it equals \(1/4G_{\mathrm{N}}\) times the unit normal vector, but away from the minimal surface it is underdetermined.

2.1.2 Bit threads

We can further rewrite (2.4) by thinking about the integral curves of a flow v, in the same way that it is often useful to think about electric and magnetic field lines rather than the vector fields themselves. We can choose a set of integral curves whose transverse density equals |v| everywhere. In [16] these curves were called bit threads.

The integral curves of a given vector field are oriented and locally parallel. It will be useful to generalize the notion of bit threads by dropping these two conditions. Thus, in this paper, the threads will be unoriented curves, and we will allow them to pass through a given neighborhood at different angles and even to intersect. Since the threads are not locally parallel, we replace the notion of transverse density with simply density, defined at a given point as the total length of the threads in a ball of radius R centered on that point divided by the volume of the ball, where R is chosen to be much larger than the Planck scale \(G_\mathrm{N}^{1/(d-1)}\) and much smaller than the curvature scale of \(\mathcal {M}\).Footnote 7 A thread configuration is thus defined as a set of unoriented curves on \(\mathcal {M}\) obeying the following rules:

  1. 1.

    Threads end only on \(\partial \mathcal {M}\).

  2. 2.

    The thread density is nowhere larger than \(1/4G_{\mathrm{N}}\).

A thread can be thought of as the continuum analogue of a “path” in a network, and a thread configuration is the analogue of a set of edge-disjoint paths, a central concept in the analysis of network flows.

Given a flow v, we can, as noted above, choose a set of integral curves with density |v|; dropping their orientations yields a thread configuration. In the classical or large-N limit \(G_\mathrm{N}\rightarrow 0\), the density of threads is large on the scale of \(\mathcal {M}\) and we can neglect any discretization error arising from replacing the continuous flow v by a discrete set of threads. Thus a flow maps essentially uniquely (up to the unimportant Planck-scale choice of integral curves) to a thread configuration. However, this map is not invertible: a given thread configuration may not come from any flow, since the threads may not be locally parallel, and even if such a flow exists it is not unique since one must make a choice of orientation for each thread. The extra flexibility afforded by the threads is useful since, as we will see in the next section, a single thread configuration can simultaneously represent several different flows. On the other hand, the flows are easier to work with technically, and in particular we will use them as an intermediate device for proving theorems about threads; an example is (2.6) below.

We denote the number of threads connecting a region A to its complement \(\bar{A}:=\partial \mathcal {M}{\setminus } A\) in a given configuration by \(N_{A\bar{A}}\). We will now show that the maximum value of \(N_{A\bar{A}}\) over allowed configurations is S(A):

$$\begin{aligned} S(A) = \max N_{A\bar{A}}. \end{aligned}$$
(2.6)

First, we will show that \(N_{A\bar{A}}\) is bounded above by the area of any surface \(m\sim A\) divided by \(4G_{\mathrm{N}}\). Consider a slab of thickness R around m (where again R is much larger than the Planck length and much smaller than the curvature radius of \(\mathcal {M}\)); this has volume \(R{{\,\mathrm{area}\,}}(m)\), so the total length of all the threads within the slab is bounded above by \(R{{\,\mathrm{area}\,}}(m)/4G_{\mathrm{N}}\). On the other hand, any thread connecting A to \(\bar{A}\) must pass through m, and therefore must have length within the slab at least R. So the total length within the slab of all threads connecting A to \(\bar{A}\) is at least \(RN_{A\bar{A}}\). Combining these two bounds gives

$$\begin{aligned} N_{A\bar{A}}\le \frac{1}{4G_{\mathrm{N}}}{{\,\mathrm{area}\,}}(m). \end{aligned}$$
(2.7)

In particular, for the minimal surface m(A),

$$\begin{aligned} N_{A\bar{A}}\le \frac{1}{4G_{\mathrm{N}}}{{\,\mathrm{area}\,}}(m(A))=S(A). \end{aligned}$$
(2.8)

Again, (2.8) applies to any thread configuration. On the other hand, as described above, given any flow v we can construct a thread configuration by choosing a set of integral curves whose density equals |v| everywhere. The number of threads connecting A to \(\bar{A}\) is at least as large as the flux of v on A:

$$\begin{aligned} N_{A\bar{A}}\ge \int _A v. \end{aligned}$$
(2.9)

The reason we don’t necessarily have equality is that some of the integral curves may go from \(\bar{A}\) to A, thereby contributing negatively to the flux but positively to \(N_{A\bar{A}}\). Given (2.8), however, for a max flow v(A) this bound must be saturated:

$$\begin{aligned} N_{A\bar{A}}=\int _A v(A)=S(A). \end{aligned}$$
(2.10)

The bit threads connecting A to \(\bar{A}\) are vivid manifestations of the entanglement between A and \(\bar{A}\), as quantified by the entropy S(A). This viewpoint gives an alternate interpretation to the RT formula that may in many situations be more intuitive. For example, given a spatial region A on the boundary CFT, the minimal hypersurface homologous to A does not necessarily vary continuously as A varies: an infinitesimal perturbation of A can result in the minimal hypersurface changing drastically, depending on the geometry of the bulk. Bit threads, on the other hand, vary continuously as a function of A, even when the bottleneck surface jumps.

Heuristically, it is useful to visualize each bit thread as defining a “channel” that allows for one bit of (quantum) information to be communicated between different regions on the spatial boundary. The amount of information that can be communicated between two spatially separated boundary regions is then determined by the number of channels that the bulk geometry allows between the two regions. Importantly, whereas the maximizing bit thread configuration may change depending on the boundary region we choose, the set of all allowable configurations is completely determined by the geometry. The “channel” should be viewed as a metaphor, however, similar to how a Bell pair can be be viewed as enabling a channel in the context of teleportation. While it is known that Bell pairs can always be distilled at an optimal rate S(A), we conjecture a more direct connection between bit threads and the entanglement structure of the the underlying holographic states, elaborated in Sect. 4.

2.1.3 Properties and derived quantities

Many interesting properties of entropies and quantities derived from them can be written naturally in terms of flows or threads. For example, let A, B be disjoint boundary regions, and let v be a max flow for their union, so \(\int _{AB}v=S(AB)\). Then we have, by (2.4)

$$\begin{aligned} S(A) \ge \int _A v,\quad S(B) \ge \int _B v, \end{aligned}$$
(2.11)

hence

$$\begin{aligned} S(A)+S(B)\ge \int _A v+\int _B v=\int _{AB}v=S(AB), \end{aligned}$$
(2.12)

which is the subadditivity property.

A useful property of flows is that there always exists a flow that simultaneously maximizes the flux through A and AB (or B and AB, but not in general A and B). We call this the nesting property, and it is proven in [22]. Let \(v_1\) be such a flow. We then obtain the following formula for the conditional information:

$$\begin{aligned} H(B|A):=S(AB)-S(A)=\int _{AB}v_1-\int _A v_1=\int _B v_1. \end{aligned}$$
(2.13)

We can also write this quantity in terms of threads. Let C be the complement of AB, and let \(N^1_{AB}\), \(N^1_{AC}\), \(N^1_{BC}\) be the number of threads connecting the different pairs of regions in the flow \(v_1\).Footnote 8 Using (2.10), we then have

$$\begin{aligned} S(AB) = N^1_{AC}+N^1_{BC},\quad S(A) = N^1_{AC}+N^1_{AB}. \end{aligned}$$
(2.14)

(Note that we don’t have a formula for S(B) in terms of these threads, since the configuration does not maximize the number connecting B to its complement.) Hence

$$\begin{aligned} H(B|A) = N^1_{BC} - N^1_{AB}. \end{aligned}$$
(2.15)

For the mutual information, we have

$$\begin{aligned} I(A:B):=S(A)+S(B)-S(AB) = \int _A\left( v_1-v_2\right) = N^2_{AB}+N^2_{BC}+N^1_{AB}-N^1_{BC}, \end{aligned}$$
(2.16)

where \(v_2\) is a flow with maximum flux through B and AB, and \(N^2_{ij}\) are the corresponding numbers of threads. In the next section, using the concept of a multiflow, we will write down a more concise formula for the mutual information in terms of threads [see (3.22)].

The nesting property also allows us to prove the strong subadditivity property, \(S(AB)+S(BC)\ge S(B)+S(ABC)\), where A, B, C are disjoint regions. (Unlike in the previous paragraph, here C is not necessarily the complement of AB, i.e. ABC does not necessarily cover \(\partial \mathcal {M}\).) Let v be a flow that maximizes the flux through both B and ABC. Then

$$\begin{aligned} S(AB)\ge \int _{AB}v,\quad S(BC)\ge \int _{BC}v, \end{aligned}$$
(2.17)

hence

$$\begin{aligned} S(AB)+S(BC) \ge \int _{AB}v+\int _{BC}v = \int _B v+\int _{ABC}v = S(B)+S(ABC). \end{aligned}$$
(2.18)

2.2 MMI, perfect tensors, and entropy cones

2.2.1 MMI

Given three subsystems ABC of a quantum system, the (negative) tripartite information is defined as the following linear combination of the subsystem entropies:Footnote 9

$$\begin{aligned} -\,I_3(A:B:C)&:= S(AB) + S(BC) + S(AC) - S(A) - S(B) - S(C) - S(ABC) \nonumber \\&= I(A:BC) - I(A:B) - I(A:C). \end{aligned}$$
(2.19)

The quantity \(-\,I_3\) is manifestly symmetric under permuting ABC. In fact it is even more symmetric than that; defining \(D:=\overline{ABC}\), it is symmetric under the full permutation group on ABCD. (Note that, by purity, \(S(AD)=S(BC)\), \(S(BD)=S(AC)\), and \(S(CD)=S(AB)\).) Since in this paper we will mostly be working with a fixed set of 4 parties, we will usually simply write \(-\,I_3\), without arguments.

We note that \(-\,I_3\) is sensitive only to fully four-party entanglement, in the following sense. If the state is unentangled between any party and the others, or between any two parties and the others, then \(-\,I_3\) vanishes. In a general quantum system, it can be either positive or negative. For example, in the four-party GHZ state,

$$\begin{aligned} |{\psi }\rangle _{ABCD} = \frac{1}{\sqrt{2}}\left( |{0000}\rangle +|{1111}\rangle \right) , \end{aligned}$$
(2.20)

it is negative: \(-\,I_3=-\ln 2\). On the other hand, for the following state (with D being a 4-state system),

$$\begin{aligned} |{\psi }\rangle _{ABCD} = \frac{1}{2}\left( |{0000}\rangle +|{0111}\rangle +|{1012}\rangle +|{1103}\rangle \right) \end{aligned}$$
(2.21)

it is positive: \(-\,I_3=\ln 2\).

\(-\,I_3\) is also positive for four-party so-called perfect-tensor states, which will play an important role in our considerations. A perfect-tensor state is a pure state on 2n parties such that the reduced density matrix on any n parties is maximally mixed. For four parties, this implies that all the one-party entropies are equal, and all the two-party entropies have twice that value [31]:

$$\begin{aligned} S(A)=S(B)=S(C)=S(D)=S_0,\quad S(AB)=S(BC)=S(AC)=2S_0, \end{aligned}$$
(2.22)

where \(S_0>0\). Hence

$$\begin{aligned} -\,I_3 = 2S_0>0. \end{aligned}$$
(2.23)

In this paper, we will use the term perfect tensor (PT) somewhat loosely to denote a four-party pure state whose entropies take the form (2.22) for some \(S_0>0\), even if they are not maximal for the respective Hilbert spaces.

In a general field theory, with the subsystems ABC being spatial regions, \(-\,I_3\) can take either sign [7]. However, it was proven in [18, 20] that the entropies derived from the RT formula always obey the inequality

$$\begin{aligned} -\,I_3(A:B:C) \ge 0, \end{aligned}$$
(2.24)

which is known as monogamy of mutual information (MMI). The proof involved cutting and pasting minimal surfaces. In this paper we will provide a proof of MMI based on flows or bit threads. Since a general state of a four-party system does not obey MMI, classical states of holographic systems (i.e. those represented by classical spacetimes) must have a particular entanglement structure in order to always obey MMI. It is not known what that entanglement structure is, and another purpose of this paper is to address this question.

2.2.2 Entropy cones

A general four-party pure state has 7 independent entropies, namely the 4 one-party entropies S(A), S(B), S(C), S(D), together with 3 independent two-party entropies, e.g. S(AB), S(AC), and S(BC). This set of numbers defines an entropy vector in \(\mathbb {R}^7\). There is a additive structure here because entropies add under the operation of combining states by the tensor product. In other words, if

$$\begin{aligned} |{\psi }\rangle _{ABCD} = |{\psi _1}\rangle _{A_1B_1C_1D_1}\otimes |{\psi _2}\rangle _{A_2B_2C_2D_2}, \end{aligned}$$
(2.25)

with \(\mathcal {H}_A=\mathcal {H}_{A_1}\otimes \mathcal {H}_{A_2}\) etc., then the entropy vector of \(|{\psi }\rangle \) is the sum of those of \(|{\psi _1}\rangle \) and \(|{\psi _2}\rangle \). The inequalities that the entropies satisfy—non-negativity, subadditivity, and strong subadditivity—carve out a set of possible entropy vectors which (after taking the closure) is a convex polyhedral cone in \(\mathbb {R}^7\), called the four-party quantum entropy cone. Holographic states satisfy MMI in addition to those inequalities, carving out a smaller cone, called the four-party holographic entropy cone [5]. It is a simple exercise in linear algebra to show that the six pairwise mutual informations together with \(-\,I_3\) also form a coordinate system (or dual basis) for \(\mathbb {R}^7\):

$$\begin{aligned} I(A:B),\; \; I(A:C),\; \; I(A:D),\; \; I(B:C),\; \; I(B:D),\; \; I(C:D),\; \; -\,I_3. \end{aligned}$$
(2.26)

For any point in the holographic entropy cone, these 7 quantities are non-negative—the mutual informations by subadditivity, and \(-\,I_3\) by MMI. In fact, the converse also holds. Since MMI and subadditivity imply strong subadditivity, any point in \(\mathbb {R}^7\) representing a set of putative entropies such that all 7 linear combinations (2.26) are non-negative also obeys all the other inequalities required of an entropy, and is therefore in the holographic entropy cone. In other words, using the 7 quantities (2.26) as coordinates, the holographic entropy cone consists precisely of the non-negative orthant in \(\mathbb {R}^7\).

Any entropy vector such that exactly one of the 7 coordinates (2.26) is positive, with the rest vanishing, is an extremal vector of the holographic entropy cone; it lies on a 1-dimensional edge of that cone. Since the cone is 7-dimensional, any point in the cone can be written uniquely as a sum of 7 (or fewer) extremal vectors, one for each edge. States whose entropy vectors are extremal are readily constructed: a state with \(I(A:B)>0\) and all other quantities in (2.26) vanishing is necessarily of the form \(|{\psi _1}\rangle _{AB}\otimes |{\psi _2}\rangle _C\otimes |{\psi _3}\rangle _D\), and similarly for the other pairs; while a state with \(-\,I_3>0\) and all pairwise mutual informations vanishing is necessarily a PT. It is also possible to realize such states, and indeed arbitrary points in the holographic entropy cone, by holographic states [2, 5].

Fig. 1
figure 1

Left: Skeleton graph (with two connected components) representing an arbitrary entropy vector in the four-party holographic entropy cone. Right: Skeleton graph for a three-party entropy vector

The extremal rays can also be represented as simple graphs with external vertices ABCD. For example, a graph with just a single edge connecting A and B with capacity c gives an entropy vector with \(I(A:B)=2c\) and all other coordinates in (2.26) vanishing. Similarly, a star graph with one internal vertex connected to all four external vertices and capacity c on each edge gives an entropy vector with \(-\,I_3=2c\) and all other coordinates in (2.26) vanishing. Since any vector in the holographic cone can be uniquely decomposed into extremal rays, it is reproduced by a (unique) “skeleton graph” consisting of the complete graph on \(\{A,B,C,D\}\) with capacity \(\frac{1}{2}I(A:B)\) on the edge connecting A and B and similarly for the other pairs, plus a star graph with capacity \(-\frac{1}{2}I_3\) on each edge. This is shown in Fig. 1.

Let us briefly discuss the analogous situation for states on fewer or more than four parties. For a three-party pure state, there are only 3 independent entropies (since \(S(AB)=S(C)\) etc.), so the entropy vector lives in \(\mathbb {R}^3\). Holographic states obey no extra entropy inequalities beyond those obeyed by any quantum state, namely non-negativity and subadditivity, so the holographic entropy cone is the same as the quantum entropy cone. A dual basis is provided by the three mutual informations,

$$\begin{aligned} I(A:B),\quad I(B:C),\quad I(A:C). \end{aligned}$$
(2.27)

There are 3 types of extremal vectors, given by two-party entangled pure states \(|{\psi _1}\rangle _{AB}\otimes |{\psi _2}\rangle _C\) etc. Thus the skeleton graph for three parties is simply a triangle, shown in Fig. 1.

Given a decomposition of the boundary into four regions, we can merge two of the regions, say C and D, and thereby consider the same state as a three-party pure state. Under this merging, the four-party skeleton graph on the left side of Fig. 1 turns into the three-party one on the right side as follows. The star graph splits at the internal vertex to become two edges, an A(CD) edge and a B(CD) edge, each with capacity \(-\frac{1}{2}I_3\). The first merges with the AC and AD edges from the four-party complete graph to give an A(CD) edge with total capacity

$$\begin{aligned} -\frac{1}{2}I_3+\frac{1}{2}I(A:C)+\frac{1}{2}I(A:D) = \frac{1}{2}I(A:CD). \end{aligned}$$
(2.28)

Similarly for the B(CD) edges. The AB edge remains unchanged and the CD edge is removed. This rearrangement will play a role in our considerations of Sect. 4.

The situation for five and more parties was studied in [5, 23]. For five-party pure states, there are no new inequalities beyond MMI and the standard quantum ones (non-negativity, subadditivity, strong subadditivity). There are 20 extremal vectors of the holographic entropy cone, given by 10 two-party entangled pure states, 5 four-party PTs, and 5 six-party PTs with two of the parties merged (e.g. a PT on \(A_1A_2BCDE\) with \(A=A_1A_2\)). Since the cone is only 15-dimensional, the decomposition of a generic point into a sum of extremal vectors is not unique, unlike for three or four parties. For six-party pure states, there are new inequalities; a complete list of inequalities was conjectured in [5] and proved in [23]. Notable is the fact that the extremal rays are no longer only made from perfect tensors; rather, new entanglement structures come into play. For more than six parties, some new inequalities are known but a complete list has not even been conjectured.

3 Multiflows and MMI

As we reviewed in Sect. 2.1.3, the subadditivity and strong subadditivity inequalities can be proved easily from the formula (2.4) for the entropy in terms of flows. Subadditivity follows more or less directly from the definition of a flow, while strong subadditivity requires the nesting property for flows (existence of a simultaneous max flow for A and AB). Holographic entanglement entropies also obey the MMI inequality (2.24), which was proven using minimal surfaces [18, 20]. Therefore it seems reasonable to expect MMI to admit a proof in terms of flows. However, it was shown in [16] that the nesting property alone is not powerful enough to prove MMI. Therefore, flows must obey some property beyond nesting. In this section we will state the necessary property and give a flow-based proof of MMI. The property is the existence of an object called a max multiflow. It is guaranteed by our Theorem 1, stated below and proved in Sect. 5.

3.1 Multiflows

It turns out that the property required to prove MMI concerns not a single flow, like the nesting property, but rather a collection of flows that are compatible with each other in the sense that they can simultaneously occupy the same geometry (we will make this precise below). In the network context, such a collection of flows is called a multicommodity flow, or multiflow, and there is a large literature about them. (See Sect. 6 for the network definition of a multiflow. Standard references are [15, 37]; two resources we have found useful are [8, 30].) We will adopt the same terminology for the Riemannian setting we are working in here. We thus start by defining a multiflow.

Definition 1

(Multiflow). Given a Riemannian manifold \(\mathcal {M}\) with boundary \(\partial \mathcal {M}\), let \(A_1, \ldots , A_n\) be non-overlapping regions of \(\partial \mathcal {M}\) (i.e. for \(i\ne j\), \(A_i\cap A_j\) is codimension-1 or higher in \(\partial \mathcal {M}\)) covering \(\partial \mathcal {M}\) (\(\cup _i A_i=\partial \mathcal {M} \)). A multiflow is a set of vector fields \(v_{ij}\) on \(\mathcal {M}\) satisfying the following conditions:

$$\begin{aligned} v_{ij}&=-\,v_{ji} \end{aligned}$$
(3.1)
$$\begin{aligned} \hat{n} \cdot v_{ij}&= 0 \text { on }A_k\quad (k\ne i,j) \end{aligned}$$
(3.2)
$$\begin{aligned} \nabla \cdot v_{ij}&=0 \end{aligned}$$
(3.3)
$$\begin{aligned} \sum _{i < j}^n |v_{ij}|&\le \frac{1}{4G_{\mathrm{N}}}. \end{aligned}$$
(3.4)

Given condition (3.1), there are \(n(n-1)/2\) independent vector fields. Given condition (3.2), \(v_{ij}\) has nonvanishing flux only on the regions \(A_i\) and \(A_j\), and, by (3.3), these fluxes obey

$$\begin{aligned} \int _{A_i}v_{ij}=-\int _{A_j}v_{ij}. \end{aligned}$$
(3.5)

Given conditions (3.3) and (3.4), each \(v_{ij}\) is a flow by itself. However, an even stronger condition follows: any linear combination of the form

$$\begin{aligned} v=\sum _{i<j}^n \xi _{ij}v_{ij}, \end{aligned}$$
(3.6)

where the coefficients \(\xi _{ij}\) are constants in the interval \([-1,1]\), is divergenceless and, by the triangle inequality, obeys the norm bound \(|v|\le 1/4G_{\mathrm{N}}\), and is therefore also a flow.

Given a multiflow \(\{v_{ij}\}\), we can define the n vector fields

$$\begin{aligned} v_i:=\sum _{j=1}^n v_{ij}, \end{aligned}$$
(3.7)

each of which, by the above argument, is itself a flow. Hence its flux on the region \(A_i\) is bounded above by its entropy:

$$\begin{aligned} \int _{A_i}v_i\le S(A_i). \end{aligned}$$
(3.8)

The surprising statement is that the bounds (3.8) are collectively tight. In other words, there exists a multiflow saturating all n bounds (3.8) simultaneously. We will call such a multiflow a max multiflow, and its existence is our Theorem 1:

Theorem 1

(Max multiflow). There exists a multiflow \(\{v_{ij}\}\) such that for each i, the sum

$$\begin{aligned} v_i:= \sum _{j=1}^n v_{ij} \end{aligned}$$
(3.9)

is a max flow for \(A_i\), that is,

$$\begin{aligned} \int _{A_i}v_i=S(A_i). \end{aligned}$$
(3.10)

Theorem 1 is a continuum version of a well-known theorem on multiflows on graphs, first formulated in [27] (although a correct proof wasn’t given until [9, 28]). However, the original graph-theoretic proof is discrete and combinatorial in nature and not easily adaptable to the continuum. Therefore, in Sect. 5 we will give a continuum proof based on techniques from convex optimization. (This proof can be adapted back to the graph setting to give a proof there that is new as far as we know. We refer the reader to Sect. 6.) Furthermore, we emphasize that it should not be taken for granted that a statement that holds in the graph setting necessarily also holds on manifolds. In fact, we will give an example below of a graph-theoretic theorem concerning multiflows that is not valid on manifolds.

A simple corollary of Theorem 1 in the case \(n=3\) is the nesting property for flows, which says that, given a decomposition of the boundary into regions ABC, there exists a flow v that is simultaneously a max flow for A and for AB.Footnote 10 In terms of the flows of Theorem 1 (with \(A_1=A\), \(A_2=B\), \(A_3=C\)), this flow is simply

$$\begin{aligned} v = v_{AB}+ v_{AC}+v_{BC}\,. \end{aligned}$$
(3.11)

A more interesting corollary of Theorem 1 is MMI.Footnote 11 Set \(n=4\). Given a max multiflow \(\{v_{ij}\}\), we construct the following flows:

$$\begin{aligned} \begin{aligned} u_1&:= v_{AC} + v_{AD} + v_{BC} + v_{BD} = \frac{1}{2}(v_A + v_B - v_C - v_D) \\ u_2&:= v_{AB} + v_{AD} + v_{CB} + v_{CD} = \frac{1}{2}(v_A - v_B +v_C - v_D) \\ u_3&:= v_{BA}+v_{BD}+v_{CA} + v_{CD} = \frac{1}{2}(-v_A + v_B + v_C - v_D). \end{aligned} \end{aligned}$$
(3.12)

The second equality in each line follows from the condition (3.1) and definition (3.9). Each \(u_i\) is of the form (3.6) and is therefore a flow, so its flux through any boundary region is bounded above by the entropy of that region. In particular,

$$\begin{aligned} S(AB) \ge \int _{AB} u_1, \quad S(AC) \ge \int _{AC} u_2, \quad S(BC) \ge \int _{BC} u_3. \end{aligned}$$
(3.13)

Summing these three inequalites and using (3.10) leads directly to MMI:

$$\begin{aligned} \begin{aligned} S(AB) + S(AC) + S(BC)&\ge \int _A(u_1+u_2)+\int _B(u_1+u_3)+\int _C(u_2+u_3) \\&= \int _A v_A + \int _B v_B + \int _C v_C - \int _{ABC} v_D \\&= S(A)+S(B)+S(C)+S(D). \end{aligned} \end{aligned}$$
(3.14)

The difference between the left- and right-hand sides of (3.14) is \(-\,I_3\), so (unless \(-\,I_3\) happens to vanish) it is not possible for all of the inequalities (3.13) to be saturated for a given multiflow. However, Theorem 2, proved in Sect. 5.2, shows as a special case that any single one of the inequalities (3.13) can be saturated. For example, there exists a max multiflow such that

$$\begin{aligned} \int _{AB}u_1 = S(AB). \end{aligned}$$
(3.15)

In the graph setting, it can be shown that in fact any two of the inequalities (3.13) can be saturated [25, 30]; however, this is not in general true in the continuum.Footnote 12

3.2 Threads

3.2.1 Theorem 1

We can also frame multiflows, Theorem 1, and the proof of MMI in the language of bit threads. The concept of a multiflow is very natural from the viewpoint of the bit threads, since the whole set of flows \(\{v_{ij}\}\) can be represented by a single thread configuration. Indeed, for each \(v_{ij}\) (\(i<j\)) we can choose a set of threads with density equal to \(|v_{ij}|\); given (3.2), these end only on \(A_i\) or \(A_j\). By (2.9), the number that connect \(A_i\) to \(A_j\) is at least the flux of \(v_{ij}\):

$$\begin{aligned} N_{A_i A_j}\ge \int _{A_i}v_{ij}. \end{aligned}$$
(3.16)

Since the density of a union of sets of threads is the sum of their respective densities, by (3.4) the union of these configurations over all \(i<j\) is itself an allowed thread configuration. Note that this configuration may contain, in any given neighborhood, threads that are not parallel to each other, and even that intersect each other.

Now suppose that \(\{v_{ij}\}\) is a max multiflow. Summing (3.16) over \(j\ne i\) for fixed i yields

$$\begin{aligned} \sum _{j\ne i}^n N_{A_i A_j}\ge \int _{A_i}v_i=S(A_i). \end{aligned}$$
(3.17)

But, by (2.8), the total number of threads connecting \(A_i\) to all the other regions is also bounded above by \(S(A_i)\):

$$\begin{aligned} \sum _{j\ne i}^n N_{A_i A_j}\le S(A_i). \end{aligned}$$
(3.18)

So the inequalities (3.17) and (3.18) must be saturated, and furthermore each inequality (3.16) must be individually saturated:

$$\begin{aligned} N_{A_i A_j} = \int _{A_i}v_{ij}. \end{aligned}$$
(3.19)

Thus, in the language of threads, Theorem 1 states that there exists a thread configuration such that, for all i,

$$\begin{aligned} \sum _{j\ne i}^n N_{A_i A_j} = S(A_i). \end{aligned}$$
(3.20)

We will call such a configuration a max thread configuration.

We will now study the implications of the existence of a max thread configuration for three and four boundary regions.

3.2.2 Three boundary regions

For \(n=3\), we have

$$\begin{aligned} S(A) = N_{AB}+N_{AC},\quad S(B) = N_{AB}+N_{BC},\quad S(C) = N_{AC}+N_{BC}. \end{aligned}$$
(3.21)

Since \(S(AB)=S(C)\), we find an elegant formula for the mutual information:

$$\begin{aligned} I(A:B) = 2N_{AB}. \end{aligned}$$
(3.22)

Thus, at least from the viewpoint of calculating the mutual information, it is as if each thread connecting A and B represents a Bell pair.Footnote 13 Note that (3.22) also reestablishes the subadditivity property, since clearly the number of threads cannot be negative. Similarly to (2.15), we also have for the conditional entropy,

$$\begin{aligned} H(B|A) = N_{BC}-N_{AB}. \end{aligned}$$
(3.23)

As mentioned in Sect. 2.2, the three mutual informations I(A : B), I(A : C), I(B : C) determine the entropy vector in \(\mathbb {R}^3\). Therefore, by (3.22) and its analogues, the thread counts \(N_{AB}\), \(N_{AC}\), \(N_{BC}\) determine the entropy vector, and conversely are uniquely fixed by it. Thus, in the skeleton graph representation of the entropy vector shown in Fig. 1 (right side), we can simply put \(N_{AB}\), \(N_{AC}\), \(N_{BC}\) as the capacities on the respective edges; in other words, the thread configuration “is” the skeleton graph.

3.2.3 Four boundary regions

For \(n=4\), we have, similarly to (3.21),

$$\begin{aligned} \begin{aligned} S(A)&= N_{AB}+N_{AC}+N_{AD}\\ S(B)&= N_{AB}+N_{BC}+N_{BD}\\ S(C)&= N_{AC}+N_{BC}+N_{CD}\\ S(D)&= N_{AD}+N_{BD}+N_{CD}. \end{aligned} \end{aligned}$$
(3.24)

The entropies of pairs of regions, S(AB), S(AC), and S(BC) also enter in MMI. A max thread configuration does not tell us these entropies, only the entropies of individual regions. Nonetheless, for any valid thread configuration, we have the bound (2.8). In particular, S(AB) is bounded below by the total number of threads connecting AB to CD:

$$\begin{aligned} S(AB) \ge N_{(AB)(CD)}=N_{AC}+N_{BC}+N_{AD}+N_{BD}. \end{aligned}$$
(3.25)

Similarly,

$$\begin{aligned} \begin{aligned} S(AC)&\ge N_{AB}+N_{AD}+N_{BC}+N_{CD}\\ S(BC)&\ge N_{AB}+N_{AC}+N_{BD}+N_{CD}. \end{aligned} \end{aligned}$$
(3.26)

Inequalities (3.24), (3.25), and (3.26) together imply MMI.

As we did for three parties, we can look at the mutual information between two regions. However, using (3.24) and (3.25), we now merely find a bound rather than an equality:

$$\begin{aligned} I(A:B) \le 2N_{AB}. \end{aligned}$$
(3.27)

Thus, in a four-party max configuration, each thread connecting A and B does not necessarily represent a “Bell pair.” To understand how this can occur, it is useful to look at a simple illustrative example, shown in Fig. 2, which is a star graph where each edge has capacity 1. It is easy to evaluate the entropies of the single vertices and pairs. One finds that they have the form (2.22) with \(S_0=1\); in other words, this graph represents a perfect tensor. In particular, all pairwise mutual informations vanish, while \(-\,I_3=2\). As shown in Fig. 2, there are three max thread configurations. Each such configuration has two threads, which connect the external vertices in all possible ways.

The above example highlights the fact that, unlike for \(n=3\), the thread counts \(N_{A_i A_j}\) are not determined by the entropies: in (3.24) there are only 4 equations for the 6 unknown \(N_{A_i A_j}\), while the entropies of pairs of regions only impose the inequality constraints (3.25), (3.26) on the \(N_{A_i A_j}\). However, based on Theorem 2 as summarized above (3.15), we know that there exists a max thread configuration that saturates (3.25) and therefore (3.27). The same configuration has \(I(C:D)=2N_{CD}\). Similarly, there exists a (in general different) max configuration such that \(I(A:C)=2N_{AC}\) and \(I(B:D)=2N_{BD}\), and yet another one such that \(I(A:D)=2N_{AD}\) and \(I(B:C)=2N_{BC}\). In summary, \(\frac{1}{2}I(A_i:A_j)\) is the minimal number of threads connecting \(A_i\) and \(A_j\), while \(-\,I_3\) is the total number of “excess” threads in any configuration:

$$\begin{aligned} -\,I_3 = \sum _{i<j}^n \left( N_{A_i A_j}-\frac{1}{2}I(A_i:A_j)\right) . \end{aligned}$$
(3.28)

These \(-\,I_3\) many threads are free to switch how they connect the different regions, in the manner of Fig. 2.

So far we have treated the \(n=3\) and \(n=4\) cases separately, but they are related by the operation of merging boundary regions. For example, given the four regions A, B, C, D, we can consider CD to be a single region, effectively giving a three-boundary decomposition. Under merging, not every four-party max thread configuration becomes a three-party max configuration. For example, in the case illustrated in Fig. 2, if we consider CD as a single region then, since \(S(CD)=2\), any max thread configuration must have two threads connecting CD to AB. Thus, the middle configuration, with \(N_{AB}=N_{CD}=1\), is excluded as a three-party max configuration.

Fig. 2
figure 2

Left: Star graph with capacity 1 on each edge. The entropies derived from this graph are those of a perfect tensor, (2.22), with \(S_0=1\). In particular, all the pairwise mutual informations vanish and \(-\,I_3=2\). Right: The three max thread configurations on this graph

4 State Decomposition Conjecture

In this section we consider the thread configurations discussed in the previous section for different numbers of boundary regions. Taking seriously the idea that the threads represent entanglement in the field theory, we now ask what these configurations tell us about the entanglement structure of holographic states. We will consider in turn decomposing the boundary into two, three, four, and more regions.

For two complementary boundary regions A and B, the number of threads connecting A and B in a max configuration is \(N_{AB}=S(A)=S(B)\), so in some sense each thread represents an entangled pair of qubits with one qubit in A and the other in B. Of course, these qubits are not spatially localized in the field theory—in particular they are not located at the endpoints of the thread—since even in a max configuration the threads have a large amount of freedom in where they attach to the boundary.

For three boundary regions, as discussed in Sect. 3.2.2, the max thread configuration forms a triangle, with the number of threads on the AB edge fixed to be \(N_{AB}=\frac{1}{2}I(A:B)\) and similarly with the AC and BC edges. If we take this picture seriously as a representation of the entanglement structure of the state itself, it suggests that the state contains only bipartite entanglement. In other words, there is a decomposition of the ABC Hilbert spaces,

$$\begin{aligned} \mathcal {H}_A = \mathcal {H}_{A_1}\otimes \mathcal {H}_{A_2},\quad \mathcal {H}_B = \mathcal {H}_{B_1}\otimes \mathcal {H}_{B_3},\quad \mathcal {H}_C = \mathcal {H}_{C_2}\otimes \mathcal {H}_{C_3}, \end{aligned}$$
(4.1)

(again, this is not a spatial decomposition) such that the full state decomposes into a product of three bipartite-entangled pure states:

$$\begin{aligned} |{\psi }\rangle _{ABC} = |{\psi _1}\rangle _{A_1B_1}\otimes |{\psi _2}\rangle _{A_2C_2}\otimes |{\psi _3}\rangle _{B_3C_3}, \end{aligned}$$
(4.2)

each of which carries all the mutual information between the respective regions:

$$\begin{aligned} S(A_1)&=S(B_1)=\frac{1}{2}I(A:B)\nonumber \\ S(A_2)&=S(C_2)=\frac{1}{2}I(A:C)\nonumber \\ S(B_3)&=S(C_3)=\frac{1}{2}I(B:C). \end{aligned}$$
(4.3)

(Of course, any of the factors in (4.2) can be trivial, if the corresponding mutual information vanishes.)

So far the picture only includes bipartite entanglement. For four boundary regions, however, with only bipartite entanglement we would necessarily have \(-\,I_3=0\), which we know is not always the case in holographic states. Furthermore, as we saw in Sect. 3.2.3, even in a max thread configuration, the number of threads in each group, say \(N_{AB}\), is not fixed. We saw that there is a minimal number \(\frac{1}{2}I(A_i:A_j)\) of threads connecting \(A_i\) and \(A_j\), plus a number \(-\,I_3\) of “floating” threads that can switch which pairs of regions they connect. This situation is summarized by the skeleton graph of Fig. 1, which includes six edges connecting pairs of external vertices with capacities equal to half the respective mutual informations, plus a star graph connecting all four at once with capacity \(-\frac{1}{2}I_3\). The star graph has perfect-tensor entropies. This suggests that the state itself consists of bipartite-entangled pure states connecting pairs of regions times a four-party perfect tensor:

$$\begin{aligned} \begin{aligned} |{\psi }\rangle _{ABCD}&= |{\psi _1}\rangle _{A_1B_1}\otimes |{\psi _2}\rangle _{A_2C_2}\otimes |{\psi _3}\rangle _{A_3D_3}\otimes |{\psi _4}\rangle _{B_4C_4}\otimes |{\psi _5}\rangle _{B_5D_5}\otimes |{\psi _6}\rangle _{C_6D_6} \\&\otimes |{PT}\rangle _{A_7B_7C_7D_7}. \end{aligned} \end{aligned}$$
(4.4)

(Again, any of these factors can be trivial.) This is the simplest ansatz for a four-party pure state consistent with what we know about holographic entanglement entropies. In this ansatz the MMI property is manifest.

The conjectures (4.2), (4.4) for the form of the state for three and four regions respectively are in fact equivalent. The four-region conjecture implies the three-region one, either by taking one of the regions to be empty or by merging two of the regions. A four-party perfect tensor, under merging two of the parties, factorizes into two bipartite-entangled states. For example, if we write \(C'=CD\), then the bipartite-entangled factors in (4.4) clearly take the form (4.2), while the perfect-tensor factor splits into bipartite-entangled pieces:

$$\begin{aligned} |{PT}\rangle _{A_7B_7C_7D_7} = |{\psi _1'}\rangle _{A_7C'_{7,1}}\otimes |{\psi '_2}\rangle _{B_7C'_{7,2}} \end{aligned}$$
(4.5)

for some decomposition \(\mathcal {H}_{C'_7} = \mathcal {H}_{C_7} \otimes \mathcal {H}_{D_7} \cong \mathcal {H}_{C'_{7,1}} \otimes \mathcal {H}_{C'_{7,2}}\), as follows from the fact that \({I(A_7:B_7)}_{|{PT}\rangle }=0\).

Conversely, the three-region decomposition implies the four-region one, as follows [31]. Suppose a pure state on ABCD contains only bipartite entanglement when any two parties are merged. For example, when merging C and D, it has only bipartite entanglement, and in particular contains an entangled pure state between part of A and part of B, with entropy I(A : B) / 2. There is a similar pure state shared between any pair of parties. These factors carry all of the pairwise mutual information, so what is left has vanishing pairwise mutual informations and is therefore a four-party PT (or, if \(-\,I_3\) vanishes, a completely unentangled state).

We remind the reader that, throughout this paper, we have been working in the classical, or large-N, limit of the holographic system, and we emphasize that the state decomposition conjectures stated above should be understood in this sense. Thus we are not claiming that the state takes the form (4.2) or (4.4) exactly, but rather only up to corrections that are subleading in 1 / N. If we consider, for example, a case where \(I(A:B)=0\) at leading order, such as where A and B are well-separated regions, the three-party decomposition (4.2) would suggest that \(\rho _{AB}=\rho _A\otimes \rho _B\). However, even in this case I(A : B) could still be of order O(1), so we should not expect this decomposition to hold approximately in any norm, but rather in a weaker sense.

Support for these conjectures comes from tensor-network toy models of holography [19, 31, 34]. Specifically, it was shown in [31] that random stabilizer tensor network states at large N indeed have the form (4.2), (4.4) at leading order in 1 / N. More precisely, these decompositions hold provided one traces out O(1) many degrees of freedom in each subsystem. In other words, there are other types of entanglement present (such as GHZ-type entanglement), but these make a subleading contribution to the entropies. We believe that it would be interesting to prove or disprove the state decomposition conjectures (4.2), (4.4), as well as to sharpen them by clarifying the possible form of the 1 / N corrections.

Finally, we note that it is straightforward to generalize (4.2) and (4.4) to more than four regions. Namely, we can conjecture that for n parties, \(|{\psi }\rangle _{A_1\dots A_n}\) decomposes into a direct product of states, each realizing an extremal ray in the n-party holographic entropy cone. (Note that the above procedure of using the three-party decomposition to remove bipartite entanglement between any two parties also works in the n-party case, but what is left is not necessarily an extremal vector as it is for \(n=4\).) A new feature that arises for \(n>4\), as mentioned in Sect. 2.2.2, is that a generic vector in the holographic entropy cone no longer admits a unique decomposition into extremal rays. Therefore the amount of entropy carried by each factor in the state decomposition cannot be deduced just from the entropy vector, but would require some more fine-grained information about the state. Another new feature that arises for \(n>5\) is that the extremal rays no longer arise only from perfect tensors; rather, new entanglement structures are involved. It would be interesting to explore whether the thread picture throws any light on these issues.

5 Proofs

In this section, we give proofs of our main results on the existence of multiflows in Riemannian geometries. We are not claiming mathematical rigor, particularly when it comes to functional analytical aspects. To simplify the notation, we set \(4G_{\mathrm{N}}=1\) throughout this section.

5.1 Theorem 1

For convenience, we repeat the definition of a multiflow and the statement of Theorem 1.

Definition 1

(Multiflow). Given a Riemannian manifold \(\mathcal {M}\) with boundaryFootnote 14\(\partial \mathcal {M}\), let \(A_1, \ldots , A_n\) be non-overlapping regions of \(\partial \mathcal {M}\) (i.e. for \(i\ne j\), \(A_i\cap A_j\) is codimension-1 or higher in \(\partial \mathcal {M}\)) covering \(\partial \mathcal {M}\) (\(\cup _i A_i=\partial \mathcal {M} \)). A multiflow is a set of vector fields \(v_{ij}\) on \(\mathcal {M}\) satisfying the following conditions:

$$\begin{aligned} v_{ij}&=-v_{ji} \end{aligned}$$
(5.1)
$$\begin{aligned} \hat{n} \cdot v_{ij}&= 0 \text { on }A_k\quad (k\ne i,j) \end{aligned}$$
(5.2)
$$\begin{aligned} \nabla \cdot v_{ij}&=0 \end{aligned}$$
(5.3)
$$\begin{aligned} \sum _{i < j}^n |v_{ij}|&\le 1. \end{aligned}$$
(5.4)

Theorem 1

(Max multiflow). There exists a multiflow \(\{v_{ij}\}\) such that for each i, the sum

$$\begin{aligned} v_i:= \sum _{j=1}^n v_{ij} \end{aligned}$$
(5.5)

is a max flow for \(A_i\), that is,

$$\begin{aligned} \int _{A_i}v_i=S(A_i). \end{aligned}$$
(5.6)

Our proof of Theorem 1 will not be constructive. Rather, using tools from the theory of convex optimization, specifically strong duality of convex programs,Footnote 15 we will establish abstractly the existence of a multiflow obeying (5.6). The methods employed here will carry over with only small changes to the discrete case, as shown in Sect. 6.

Proof of Theorem 1

As discussed in Sect. 3.1, for any multiflow, \(v_i\) is a flow and therefore obeys

$$\begin{aligned} \int _{A_i}v_i\le S(A_i). \end{aligned}$$
(5.7)

What we will show is that there exists a multiflow such that

$$\begin{aligned} \sum _{i=1}^n \int _{A_i}v_i \ge \sum _{i=1}^n S(A_i). \end{aligned}$$
(5.8)

This immediately implies that (5.7) is saturated for all i.

In order to prove the existence of a multiflow obeying (5.8), we will consider the problem of maximizing the left-hand side of (5.8) over all multiflows as a convex optimization problem, or convex program. That this problem is convex follows from the following facts: (1) the variables (the vector fields \(v_{ij}\)) have a natural linear structure; (2) the equality constraints (5.1), (5.2), (5.3) are affine (in fact linear); (3) the inequality constraint (5.4) is convex (i.e. it is preserved by taking convex combinations); (4) the objective, the left-hand side of (5.8), is a concave (in fact linear) functional. We will find the Lagrangian dual of this problem, which is another convex program involving the constrained minimization of a convex functional. We will show that the objective of the dual program is bounded below by the right-hand side of (5.8), and therefore its minimum \(d^\star \) is bounded below:

$$\begin{aligned} d^\star \ge \sum _{i=1}^n S(A_i). \end{aligned}$$
(5.9)

We will then appeal to strong duality, which states that the maximum \(p^\star \) of the original (primal) program equals the minimum of the dual,

$$\begin{aligned} p^\star = d^\star . \end{aligned}$$
(5.10)

We thus obtain

$$\begin{aligned} p^\star \ge \sum _{i=1}^n S(A_i), \end{aligned}$$
(5.11)

showing that there is a multiflow obeying (5.8).

To summarize, we need to (a) derive the dual program and show that strong duality holds, and (b) show that its objective is bounded below by \(\sum _{i=1}^n S(A_i)\). We will do these in turn. Many of the steps are similar to those in the proof of the Riemannian max flow-min cut theorem, described in [22]; the reader who wishes to see the steps explained in more detail should consult that reference.

(a) Dualization The Lagrangian dual of a convex program is defined by introducing a Lagrange multiplier for each constraint and then integrating out the original (primal) variables, leaving a program written in terms of the Lagrange multipliers. More specifically, an inequality constraint is enforced by a Lagrange multiplier \(\lambda \) which is itself subject to the inequality constraint \(\lambda \ge 0\). In integrating out the primal variables, the objective plus Lagrange multiplier terms (together called the Lagrangian) is minimized or maximized without enforcing the constraints. The resulting function of the Lagrange multipliers is the objective of the dual program. The requirement that the minimum or maximum of the Lagrangian is finite defines the constraints of the dual program (in addition to the constraints \(\lambda \ge 0\) mentioned above). If the primal is a minimization program then the dual is a maximization one and vice versa.

In fact it is not necessary to introduce a Lagrange multiplier for each constraint of the primal program. Some constraints can be kept implicit, which means that no Lagrange multiplier is introduced and those constraints are enforced when integrating out the primal variables.

Our task is to dualize the program of maximizing \(\sum _{i=1}^n \int _{A_i}v_i\) over all multiflows, i.e. over sets \(\{v_{ij}\}\) of vector fields obeying (5.1)–(5.4); as discussed above, this is a convex program. We will choose to keep (5.1) and (5.2) implicit. For the constraint (5.3), we introduce a set of Lagrange multipliers \(\psi _{ij}\) (\(i<j\)), each of which is a scalar field on \(\mathcal {M}\). Note that \(\psi _{ij}\) is only defined for \(i<j\) since, given the implicit constraint (5.1), the constraint (5.3) only needs to be imposed for \(i<j\). For the inequality constraint (5.4) we introduce the Lagrange multiplier \(\lambda \), which is also a scalar function on \(\mathcal {M}\) and is subject to the constraint \(\lambda \ge 0\). The Lagrangian is

$$\begin{aligned} \begin{aligned} L\left[ \{v_{ij}\},\{\psi _{ij}\},\lambda \right]&= \sum _{i=1}^n \int _{A_i}\sqrt{h}\sum _{j=1}^n \hat{n}\cdot v_{ij}\\&\qquad {}+\, \int _{\mathcal {M}}\sqrt{g}\left[ \sum _{i<j}^n \psi _{ij}\nabla \cdot v_{ij}+\lambda \left( 1-\sum _{i<j}^n |v_{ij}|\right) \right] . \end{aligned} \end{aligned}$$
(5.12)

Rewriting the first term slightly, integrating the divergence term by parts, and using the constraint (5.2), the Lagrangian becomes

$$\begin{aligned} \begin{aligned} L\left[ \{v_{ij}\},\{\psi _{ij}\},\lambda \right]&= \sum _{i<j}^n\left[ \int _{A_i}\sqrt{h}\,\hat{n}\cdot v_{ij}(\psi _{ij}+1)+\int _{A_j}\sqrt{h}\,\hat{n}\cdot v_{ij}(\psi _{ij}-1)\right] \\&\quad +\,\int _{\mathcal {M}}\sqrt{g}\left[ \lambda -\sum _{i<j}^n \left( v_{ij}\cdot \nabla \psi _{ij}+\lambda |v_{ij}|\right) \right] . \end{aligned} \end{aligned}$$
(5.13)

We now maximize the Lagrangian with respect to \(v_{ij}\) [again, only imposing constraints (5.1), (5.2) but not (5.3), (5.4)]. The requirement that the maximum is finite leads to constraints on the dual variables \(\{\psi _{ij}\}\), \(\lambda \). Since the Lagrangian, as written in (5.13), is ultralocal in \(v_{ij}\), we can do the maximization pointwise. On the boundary, for a given \(i<j\), at a point in \(A_i\) or \(A_j\), \(\hat{n}\cdot v_{ij}\) can take any value. Therefore, in order for the maximum to be finite, its coefficient must vanish, leading to the constraints

$$\begin{aligned} \psi _{ij} = -1\text { on }A_i,\quad \psi _{ij} = 1\text { on }A_j. \end{aligned}$$
(5.14)

When those constraints are satisfied, the boundary term vanishes. In the bulk, the term \(-(v_{ij}\cdot \nabla \psi _{ij}+\lambda |v_{ij}|)\) is unbounded above as a function of \(v_{ij}\) unless

$$\begin{aligned} \lambda \ge |\nabla \psi _{ij}|, \end{aligned}$$
(5.15)

in which case the maximum (at \(v_{ij}=0\)) vanishes. (As a result of (5.15), the constraint \(\lambda \ge 0\) is automatically satisfied and can be dropped.) The only term left in the Lagrangian is \(\int _{\mathcal {M}}\sqrt{g}\,\lambda \).

All in all, we are left with the following dual program:

$$\begin{aligned}&\text {Minimize }\int _{\mathcal {M}}\sqrt{g}\,\lambda \text { with respect to }\{\psi _{ij}\},\lambda \nonumber \\&\qquad \text { subject to } \lambda \ge |\nabla \psi _{ij}|,\quad \psi _{ij} = -1\text { on }A_i,\quad \psi _{ij} = 1\text { on }A_j, \end{aligned}$$
(5.16)

where again, \(\psi _{ij}\) is defined only for \(i<j\).

Strong duality follows from the fact that Slater’s condition is satisfied. Slater’s condition states that there exists a value for the primal variables such that all equality constraints are satisfied and all inequality constraints are strictly satisfied (i.e. satisfied with \(\le \) replaced by <). This is the case here: the configuration \(v_{ij}=0\) satisfies all the equality constraints and strictly satisfies the norm bound (5.4).

(b) Bound on dual objective It remains to show that, subject to the constraints in (5.16), the objective is bounded below by \(\sum _i S(A_i)\).

First, because \(\psi _{ij} = -1\) on \(A_i\) and 1 on \(A_j\), for any curve \(\mathcal {C}\) from a point in \(A_i\) to a point in \(A_j\), we have

$$\begin{aligned} \int _{\mathcal {C}}ds\, \lambda \ge \int _{\mathcal {C}} ds\,|\nabla \psi _{ij}|\ge \int _{\mathcal {C}}ds\, \hat{t} \cdot \nabla \psi _{ij} = \int _{\mathcal {C}} d\psi _{ij} = 2, \end{aligned}$$
(5.17)

where ds is the proper length element, \(\hat{t}\) is the unit tangent vector, and in the second inequality we used the Cauchy–Schwarz inequality. Now, for each i, define the function \(\phi _i(x)\) on \(\mathcal {M}\) as the minimum of \(\int _{\mathcal {C}}ds\,\lambda \) over any curve from \(A_i\) to x:

$$\begin{aligned} \phi _i(x) = \inf _{\begin{array}{c} \mathcal {C}\text { from}\\ A_i\text { to }x \end{array}}\int _{\mathcal {C}} ds\,\lambda . \end{aligned}$$
(5.18)

By virtue of (5.17),

$$\begin{aligned} \phi _i(x) + \phi _j(x) \ge 2\quad (i\ne j). \end{aligned}$$
(5.19)

Define the region \(R_i\) as follows:

$$\begin{aligned} R_i:=\{x\in \mathcal {M}:\phi _i(x)<1\} . \end{aligned}$$
(5.20)

It follows from (5.19) that \(R_i\cap R_j=\emptyset \) for \(i \ne j\). Given that \(\lambda \ge 0\), this implies that the dual objective is bounded below by the sum of the integrals on the \(R_i\)s:

$$\begin{aligned} \int _{\mathcal {M}}\sqrt{g}\, \lambda \ge \sum _{i=1}^n \int _{R_i} \sqrt{g}\,\lambda . \end{aligned}$$
(5.21)

Finally, we will show that each term in the sum on the right-hand side of (5.21) is bounded below by \(S(A_i)\). Using Hamilton–Jacobi theory, where we treat \(\int _{C}ds\, \lambda \) as the action, it is straightforward to show that \(|\nabla \phi _i| = \lambda \), so this term can be written \(\int _{R_i}\sqrt{g}|\nabla \phi _i|\). This integral in turn equals the average area of the level sets of \(\phi _i\) for values between 0 and 1. Since \(\phi _i=0\) on \(A_i\) and \(\ge 2\) on \(\overline{A_i}\), each level set is homologous to \(A_i\) and so has area at least as large as that of the minimal surface \(m(A_i)\). This is precisely \(S(A_i)\), so the average is also at least \(S(A_i)\). (The reasoning here is the same as used in the proof of the max flow-min cut theorem; see in particular Step 3 of Section 3.2 in [22].) This completes the proof. \(\quad \square \)

We end this subsection with two comments on the proof. The first is that the converse to (5.17) holds, in other words, given a function \(\lambda \) on \(\mathcal {M}\) such that \(\int _C ds\,\lambda \ge 2\) for any curve \(\mathcal {C}\) connecting different boundary regions, there exist functions \(\psi _{ij}\) satisfying the constraints of the dual program (5.16). These can be constructed in terms of the functions \(\phi _i\) and regions \(R_i\) defined above:

$$\begin{aligned} \psi _{ij} = {\left\{ \begin{array}{ll}\phi _i-1\text { on }R_i\\ 1-\phi _j\text { on }R_j\\ 0\text { elsewhere}\end{array}\right. }. \end{aligned}$$
(5.22)

Thus (5.16) is equivalent to the following program:

$$\begin{aligned}&\text {Minimize }\int _{\mathcal {M}}\sqrt{g}\,\lambda \text { with respect to }\lambda \nonumber \\&\qquad \text { subject to } \int _{\mathcal {C}}ds\,\lambda \ge 2\text { for all }\mathcal {C}\text { connecting different boundary regions.} \end{aligned}$$
(5.23)

This program is the continuum analogue of the “metrics on graphs” type of program that arises as duals of graph multiflow programs (see [30]).

Second, we know from Theorem 1 that the bound (5.11) is saturated, implying that (5.9) is saturated. In fact, it is straightforward to construct the minimizing configuration \(\{\psi _{ij}\},\lambda \) which achieves this bound. Letting \(m(A_i)\) be the minimal surface homologous to \(A_i\) and \(r(A_i)\) the corresponding homology region, we set

$$\begin{aligned} \psi _{ij} = {\left\{ \begin{array}{ll}-1\text { on }r(A_i)\\ 1\text { on }r(A_j)\\ 0\text { elsewhere}\end{array}\right. },\quad \lambda = \sum _i\delta _{m(A_i)}, \end{aligned}$$
(5.24)

where \(\delta _{m(A_i)}\) is a delta-function supported on \(m(A_i)\).

5.2 Theorem 2

In this subsection we prove a theorem which establishes a sort of “nesting property” for multiflows. The theorem states that a multiflow exists that not only provides a max flow for each individual region \(A_i\) but also for any given set of regions \(s\subset \{A_i\}\). (The example \(n=4\), \(s=AB\) was considered in (3.15).) The corresponding flow \(v_s\) is defined as the sum of the vector fields \(v_{ij}\) from s to \(s^c\):

$$\begin{aligned} v_s := \sum _{A_i\in s\atop A_j\not \in s}v_{ij}. \end{aligned}$$
(5.25)

Being a max flow means

$$\begin{aligned} \int _s v_s = S(s). \end{aligned}$$
(5.26)

For example, this was applied in Sect. 3.1 to the four-region case with \(s=AB\).

Theorem 2

(Nested max multiflow). Given a composite boundary region \(s\subset \{A_i\}\), there exists a multiflow \(\{v_{ij}\}\) such that for each i, the sum

$$\begin{aligned} v_i:= \sum _{j=1}^n v_{ij} \end{aligned}$$
(5.27)

is a max flow for \(A_i\), that is,

$$\begin{aligned} \int _{A_i}v_i=S(A_i), \end{aligned}$$
(5.28)

and the sum

$$\begin{aligned} v_s := \sum _{A_i\in s\atop A_j\not \in s}v_{ij}. \end{aligned}$$
(5.29)

is a max flow for s, that is,

$$\begin{aligned} \int _s v_s = S(s). \end{aligned}$$
(5.30)

Proof of Theorem 2

The proof proceeds very similarly to that for Theorem 1; we will only point out the differences. Since \(v_s\) is automatically a flow, in addition to (5.7) we have

$$\begin{aligned} \int _s v_s\le S(s). \end{aligned}$$
(5.31)

Therefore, to prove the theorem, it suffices to show that there exists a multiflow such that (in place of (5.8)),

$$\begin{aligned} \int _s v_s+\sum _{i=1}^n \int _{A_i}v_i\ge S(s)+\sum _{i=1}^n S(A_i). \end{aligned}$$
(5.32)

For this purpose we dualize the program of maximizing the left-hand side of (5.32) over multiflows. Compared to the proof of Theorem 1, this adds a term \(\int _s v_s\) to the primal objective, and therefore to the Lagrangian (5.12). This term can be written

$$\begin{aligned} \sum _{\begin{array}{c} i<j\\ A_i\in s\\ A_j\not \in s \end{array}}\int _{A_i}\sqrt{h}\,\hat{n}\cdot v_{ij}- \sum _{\begin{array}{c} i<j\\ A_i\not \in s\\ A_j\in s \end{array}}\int _{A_j}\sqrt{h}\,\hat{n}\cdot v_{ij}. \end{aligned}$$
(5.33)

This term has the effect, after integrating out the \(v_{ij}\)s, of changing the boundary conditions for the dual variables \(\psi _{ij}\). The dual program is now

$$\begin{aligned} \begin{aligned}&\text {Minimize }\int _{\mathcal {M}}\sqrt{g}\,\lambda \text { with respect to }\{\psi _{ij}\},\lambda \\&\quad \text { subject to } \lambda \ge |\nabla \psi _{ij}|,\\&\qquad \qquad \left. \psi _{ij}\right| _{A_i} = {\left\{ \begin{array}{ll} -2,\quad A_i\in s,A_j\not \in s \\ -1,\quad \text {otherwise} \end{array}\right. } \\&\qquad \qquad \left. \psi _{ij}\right| _{A_j} = {\left\{ \begin{array}{ll} 2,\quad A_i\not \in s,A_j\in s \\ 1,\quad \text {otherwise} \end{array}\right. }. \end{aligned} \end{aligned}$$
(5.34)

This implies that the bound (5.17) on \(\int _{\mathcal {C}}ds\,\lambda \) for a curve \(\mathcal {C}\) from a point in \(A_i\) to a point in \(A_j\) becomes

$$\begin{aligned} \int _{\mathcal {C}}ds\,\lambda \ge {\left\{ \begin{array}{ll}2,\quad A_i,A_j\in s\text { or }A_i,A_j\notin s \\ 3,\quad A_i\in s,A_j\not \in s\text { or }A_i\not \in s,A_j\in s \end{array}\right. }. \end{aligned}$$
(5.35)

We now define the functions \(\phi _i\) and regions \(R_i\) as in the proof of Theorem 1, and in addition the function \(\phi _s\) and region \(R_s\):

$$\begin{aligned} \phi _s(x)&:=\min _{A_i\in s}\phi _i(x)=\min _{\mathcal {C}}\int _{\mathcal {C}}ds\,\lambda \end{aligned}$$
(5.36)
$$\begin{aligned} R_s&:=\{x\in \mathcal {M}:1<\phi _s<2\}, \end{aligned}$$
(5.37)

where \(\mathcal {C}\) is any curve from s to x. It follows from (5.18), (5.20), and (5.35) that the regions \(R_i\) do not intersect each other nor \(R_s\). Therefore the objective in (5.34) is bounded below by

$$\begin{aligned} \int _{R_s}\sqrt{g}\,\lambda +\sum _{i=1}^n\int _{R_i}\sqrt{g}\,\lambda . \end{aligned}$$
(5.38)

The integral over \(R_i\) is bounded below by \(S(A_i)\) by the same argument as in the proof of Theorem 1, and the integral over \(R_s\) is by S(s) by a similar one: Again, \(\lambda =|\nabla \phi _s|\), so \(\int _{R_s}\sqrt{g}\,\lambda =\int _{R_s}\sqrt{g}\,|\nabla \phi _s|\), which in turn equals the average area of the level sets of \(\phi _s\) for values between 1 and 2. Since \(\phi _s\) is 0 on s and \(\ge 3\) on \(\bar{s}\), those level sets are homologous to s. Therefore their average area is at least the area of the minimal surface homologous to s, which is S(s). \(\quad \square \)

6 Multiflows on Networks

In this section we investigate multiflows on networks. This study can be thought of as a discrete analogue of the results in the previous sections, with the spacetime replaced by a weighted graph, the flows by graph flows, and the Ryu–Takayanagi surfaces by minimal cuts. Since the results in this section are stand-alone mathematical results, we will remain agnostic as to how (and if) a network is obtained from a spacetime: this could be done e.g. via the graph models of [5], in some other way, or not at all.

There are two motivations for the study of multiflows on networks: (1) it could yield new insights into discrete models of gravity, but also (2) it can produce new mathematical results and conjectures in graph theory.

In this section we will report on several items:

  1. 1.

    In Sect. 6.2 we will give a convex optimization proof of the discrete analogue of Theorem 1 (Theorem 3 below). Although Theorem 3 has been proven before in the literature using combinatorial methods, this convex optimization proof is new to the best of our knowledge, and closely follows the proof in the continuum setup.

  2. 2.

    We will prove a decomposition of an arbitrary network with three boundary vertices into three subnetworks, such that each subnetwork computes precisely one of the three boundary mutual informations, and has zero value for the other two mutual informations. Furthermore, we will conjecture a decomposition of an arbitrary network with four boundary vertices into \(6+1\) subnetworks, such that each of the six networks computes precisely one of the six pairwise mutual informations, and has vanishing mutual informations for the other five pairs of boundary vertices, as well as vanishing tripartite information. The remainder subnetwork has vanishing mutual informations and has tripartite information equal to that of the original network. The tripartite decomposition is the discrete analogue of the decomposition in Sect. 3.2.2 in the continuous case, and the four-partite decomposition is a slight generalization of the decomposition in Sect. 3.2.3. We will also conjecture a decomposition of networks with arbitrary number of boundary vertices.

  3. 3.

    On networks with positive rational capacities, we will give a constructive combinatorial proof of the existence of a certain configuration of flows (in what we will call the flow extension lemma, Lemma 2), which by itself is sufficient to establish the nonnegativity of tripartite information. The result applies to not only undirected graphs, but also more generally to a certain class of directed graphs which we call inner-superbalanced (to be defined later).

6.1 Background on networks

Denote a graph by (VE), where V is the set of vertices and E is the set of edges. We first consider the case of directed graphs. For an edge \(e \in E\), denote by s(e) and t(e) the source and target, respectively, of e. A capacity function on (VE) is a map \(c:\, E \rightarrow \mathbb {R}_{\ge 0}\). For each \(e \in E\), \(c^e:= c(e)\) is called the capacity of e. We refer the graph (VE) together with a capacity function c as a network\(\Sigma = (V,E,c)\). Given a network \(\Sigma \), we designate a subset of vertices \(\partial \Sigma \subset V\) as the boundary of \(\Sigma \). Vertices in \(\partial \Sigma \) play the role of sources or sinks.

Definition 2

(Discrete flows). Given a network \(\Sigma = (V,E,c)\), a flow on \(\Sigma \) is a function \(v:\, E \rightarrow \mathbb {R}_{\ge 0}\) on edges such that the following two properties hold.

  1. 1.

    Capacity constraint: for all edges \(e\in E\), \(|v^e| \le c^e\).

  2. 2.

    Flow conservation: for all vertices \(x \not \in \partial \Sigma \),

    $$\begin{aligned} \sum _{E \ni e:t(e) = x} v^{e} = \sum _{E \ni e :s(e) = x} v^{e}. \end{aligned}$$
    (6.1)

For a network \(\Sigma = (V,E,c)\), define the virtual edge set \(\tilde{E}\) to be the set of edges obtained by reversing all directions of edges in E. Then clearly a flow v on \(\Sigma \) can be uniquely extended to a function, still denoted by v, on \(E \sqcup \tilde{E}\) such that \(v^{e} = -v^{\tilde{e}}\) where \(\tilde{e} \in \tilde{E}\) corresponds to \(e \in E\). All flows are implicitly assumed to have been extended in this way. Then the flow conservation property in (6.1) can be rewritten as

$$\begin{aligned} \sum _{E\sqcup \tilde{E} \ni e:s(e) = x} v^{e} = 0,\quad \forall x \in V. \end{aligned}$$
(6.2)

Definition 3

Let v be a flow on \(\Sigma = (V,E,c)\) and \(A \subset \partial \Sigma \) be a subset of the boundary vertices.

  1. 1.

    The flux of v out of A is defined to be

    $$\begin{aligned} S_{\Sigma }(A;v):= \sum _{E \sqcup \tilde{E} \ni e :s(e) \in A } v^{e}. \end{aligned}$$
    (6.3)
  2. 2.

    A max flow on A is a flow that has maximal flux out of A among all flows. The maximal flux is denoted by \(S_{\Sigma }(A)\) (or S(A) when no confusion arises), i.e.

    $$\begin{aligned} S_{\Sigma }(A) = \max _{v \text { flow}} \,S_{\Sigma }(A;v). \end{aligned}$$
    (6.4)
  3. 3.

    An edge cut set C with respect to A is a set of edges such that there exists a partition \(V = V_1 \sqcup V_2\) with \(A \subset V_1, \ \partial \Sigma {\setminus } A \subset V_2\), and \(C = \{e \in E\,:\, s(e) \in V_1, t(e) \in V_2 \}\). The value of C is defined to be \(|C| = \sum _{e \in C} c^{e}\). A min cut with respect to A is an edge cut set that has minimal value among all edge cut sets.

It is a classical result (the max-flow min-cut theorem) [12, 14] that S(A) is equal to the value of a min cut with respect to A.

In the continuum setup, S(A) plays the role of entanglement entropy of the boundary region A by the Ryu–Takayanagi formula. On networks, we still call S(A) the “entropy” of A by analogy. Similarly, for pairwise nonoverlapping boundary subsets ABC, define the mutual information and the tripartite information by

$$\begin{aligned} I(A:B)&:= S(A) + S(B) - S(AB) \end{aligned}$$
(6.5)
$$\begin{aligned} -\,I_3(A:B:C)&:= S(AB) + S(BC) + S(AC) - S(A) - S(B) - S(C) - S(ABC). \end{aligned}$$
(6.6)

If (VE) is an undirected graph, it can be viewed as a directed graph (VD(E)) where D(E) is obtained by replacing each edge \(e \in E\) with a pair of parallel oppositely-oriented edges \(e_1,e_2\). An undirected network \(\Sigma = (V,E,c)\) can be viewed as a directed network \(D(\Sigma ) =(V,D(E),D(c))\), where \({D(c)}^{e_1} = {D(c)}^{e_2}:= c^e\). We define a flow on \(\Sigma \) to be a flow on \(D(\Sigma )\). From the viewpoint of computing the fluxes of flows, we can always assume that a flow v on \(D(\Sigma )\) satisfies the following property. Namely, for each pair of parallel edges \((e_1,e_2)\), v is positive on at most one of the two edges. This allows us to define a flow on an undirected network \(\Sigma \) without referring to \(D(\Sigma )\). Firstly, we arbitrarily fix a direction on each edge \(e \in E\). Then a flow on \(\Sigma \) is a function \(v: E \rightarrow \mathbb {R}\) that satisfies the same two conditions given in Definition 2. The convention here is that if v is negative on some \(e \in E\) with the pre-fixed direction, then it means v flows backwards along e with the value \(|v^{e}|\). Also note that when computing the value of a cut on \(\Sigma \), the edges of E should be treated as undirected or bidirected, rather than directed with the prefixed orientation. The concepts of max flows and min cuts can be translated to undirected networks in a straightforward way, and the max-flow/min-cut theorem still holds.

We will consider undirected networks in Sects. 6.2 and 6.3, and directed networks in Sect. 6.4.

6.2 Discrete multiflow theorem and convex duality

In this subsection, all networks \(\Sigma = (V,E,c)\) are undirected. For simplicity, assume the underlying graph is simple and connected. We also assume an arbitrary direction for each edge has been chosen. For an edge e with \(x = s(e), y = t(e)\), we write e as \({\langle xy \rangle }\), then \(\langle yx \rangle \) denotes the edge \(\tilde{e}\) in \(\tilde{E}\). The relation \(x \sim y\) means \({\langle xy \rangle }\in E \sqcup \tilde{E}\), or equivalently, x and y are connected by an undirected edge. For consistency with the notation used in the continuum setup, where different flows are labeled by subscripts, we will reserve subscripts of flows for the same purpose, and write \(v^e\) for the value of a flow v on an edge e. Also, for convenience, we will label the vertices of \(\partial \Sigma \) by \(A_i\), \(i\in \{1,2,\ldots ,n\}\). We do not allow for multiple boundary vertices to belong to the same \(A_i\), but this will not result in any loss of generality. Furthermore, we will assume, also without loss of generality, that each vertex \(A_i\in \partial \Sigma \) connects to precisely one vertex \(A_{\bar{i}}\) in \(V {\setminus } \partial \Sigma \).

Definition 4

(Multiflow, discrete version). Given a network \(\Sigma \) with boundary \(\partial \Sigma \), a multiflow \(\{ v_{ij} \}\) is a set of flows such that:

  1. 1.

    For all \(i, j \in \{1,\dots ,n\}\),

    $$\begin{aligned} v_{ij} = - v_{ji} \quad \text {and} \quad v_{ij}^{\langle k \bar{k} \rangle } = 0, \end{aligned}$$
    (6.7)

    for any boundary vertex \(A_k \in \partial \Sigma {\setminus } \{A_i,A_j\}\).

  2. 2.

    The set of flows is collectively norm-bounded: for all edges \(e \in E\),

    $$\begin{aligned} \sum _{i < j}^n |v_{ij}^e | \le c^e. \end{aligned}$$
    (6.8)

As before, the compatibility condition ensures that any linear combination \(\sum _{i < j}^n \xi _{ij} v_{ij}\) is a flow, provided that \(|\xi _{ij}| \le 1\).

The following theorem is well known in the multicommodity literature. It was first formulated in [27] with a correct proof given in [9, 28] which is based on a careful analysis of the structure of max multiflows and involves delicate flow augmentations (cf. [15]). Here we adapt the proof in the continuum setup to give a new proof (as far as we know) of this theorem via the convex optimization method. The proof proceeds mostly in parallel with the continuum case with some small changes. Readers who are not interested in the proof can skip to the next subsection. Theorem 2 also holds on networks, as it follows from the locking theorem [25]. A convex optimization proof of Theorem 2 on networks would be similar to the continuum proof in Sect. 5. We will not, however, spell out the details here.

Theorem 3

(Max multiflow, discrete version). Given a network \(\Sigma \), there exists a multiflow \(\{v_{ij}\}\) such that, for any i, the flow \(\sum _{j=1}^n v_{ij}\) is a max flow on \(A_i\).

Proof

Our goal is to determine the value

$$\begin{aligned} p^\star = \text {Maximize} \quad \sum _{i\ne j}^n v_{ij}^{\langle i \bar{i} \rangle } \quad \text {subject to (6.2), (6.7), (6.8)} \end{aligned}$$
(6.9)

of our primal optimization problem. We collect all constraints in an explicit manner except the condition \(v_{ij} = -v_{ji}\), which is treated as an implicit constraint. Introduce

$$\begin{aligned} L[v_{ij},\psi _{ij},\lambda ]&= \sum _{i \not = j}^n v_{{i}{j}}^{\langle {i}\bar{i} \rangle } + \sum _{{i}<{j}}^n \sum _{y\in V{\setminus }\{i,j\}} \sum _{x\sim y} \psi _{{i}{j}}^{y} v_{{i}{j}}^{{\langle xy \rangle }} +\sum _{{\langle xy \rangle }\in E} \lambda ^{\langle xy \rangle }\left( c^{{\langle xy \rangle }} - \sum _{{i}<{j}}^n | v_{{i}{j}}^{\langle xy \rangle }| \right) , \end{aligned}$$
(6.10)

where \(\psi _{ij}^x\) are Lagrange multipliers that enforce (6.2) and (6.7), and \(\lambda ^{\langle xy \rangle }\) those that enforce (6.8). The Lagrange dual function is then

$$\begin{aligned} g(\psi _{ij},\lambda ) = \sup _{v_{ij}} L[v_{ij},\psi _{ij},\lambda ]. \end{aligned}$$
(6.11)

As in the proof of Theorem 1, we introduce \(d^\star = \inf g(\psi _{ij},\lambda )\). By Slater’s theorem, we again see that strong duality holds, as the primal constraints are strictly satisfied for the choice \(v^{\langle xy \rangle }_{ij} = 0\). Thus, \(p^\star = d^\star \), and we have reduced our primal objective (6.9) to solving for \(d^\star \).

By rewriting the \(i\ne j\) sum in the first term of (6.10) as two sums over \(i<j\) and \(j<i\), and interchanging the ij summation indices in the second sum, (6.10) can be simplified to

$$\begin{aligned} L[v_{ij},\psi _{ij},\lambda ] = \sum _{{\langle xy \rangle }\in E} \left[ \sum _{{i}<{j}}^n \left( \psi _{{i}{j}}^{y} - \psi _{{i}{j}}^{x} \right) v_{{i}{j}}^{{\langle xy \rangle }} + \lambda ^{\langle xy \rangle }\left( c^{{\langle xy \rangle }} - \sum _{{i}<{j}}^n | v_{{i}{j}}^{\langle xy \rangle }| \right) \right] . \end{aligned}$$
(6.12)

Note that here we have introduced the boundary values

$$\begin{aligned} -\psi _{{i}{j}}^{i} = \psi _{{i}{j}}^{j} = 1, \end{aligned}$$
(6.13)

which should not be confused with the adjustable Lagrange multipliers \(\psi _{ij}^x\). For fixed \(\{\psi _{ij}^x\}\), we can always choose \(v_{ij}^{\langle xy\rangle }\) so that \({\text {sgn}}(v_{ij}^{\langle xy\rangle }) ={\text {sgn}}(\psi _{ij}^y-\psi _{ij}^x)\). Hence,

$$\begin{aligned} \begin{aligned} g(\psi _{ij},\lambda )&= \sup _{v_{ij}} \sum _{{\langle xy \rangle }\in E} \left[ \sum _{{i}<{j}}^n \left| \psi _{{i}{j}}^{y} - \psi _{{i}{j}}^{x} \right| | v_{{i}{j}}^{{\langle xy \rangle }} | + \lambda ^{\langle xy \rangle }\left( c^{{\langle xy \rangle }} - \sum _{{i}<{j}}^n | v_{{i}{j}}^{\langle xy \rangle }| \right) \right] \\&= \sup _{v_{ij}} \sum _{{\langle xy \rangle }\in E} \left[ \sum _{{i}<{j}}^n \left( \left| \psi _{{i}{j}}^{y} - \psi _{{i}{j}}^{x} \right| - \lambda ^{\langle xy \rangle }\right) | v_{{i}{j}}^{{\langle xy \rangle }} | + \lambda ^{\langle xy \rangle }c^{{\langle xy \rangle }} \right] . \end{aligned} \end{aligned}$$
(6.14)

We observe that this is finite if and only if

$$\begin{aligned} \left| \psi _{{i}{j}}^{y} - \psi _{{i}{j}}^{x} \right| - \lambda ^{\langle xy \rangle }\le 0\quad \forall \ {i},{j} \in \partial \Sigma , \quad {\langle xy \rangle }\in E, \end{aligned}$$
(6.15)

in which case the supremum is obtained by setting \(v_{{i}{j}}^{{\langle xy \rangle }}=0\) everywhere. Therefore, it follows that

$$\begin{aligned} d^\star = \inf g(\psi _{ij},\lambda ) = \inf _\lambda \sum _{{\langle xy \rangle }\in E} \lambda ^{\langle xy \rangle }c^{{\langle xy \rangle }}, \end{aligned}$$
(6.16)

subject to the edgewise condition

$$\begin{aligned} \lambda ^{\langle xy \rangle }\ge \max _{{i},{j} \in \partial \Sigma } \left| \psi _{{i}{j}}^{y} - \psi _{{i}{j}}^{x} \right| . \end{aligned}$$
(6.17)

This can be compactly written as

$$\begin{aligned} d^\star = \min _{\psi _{ij}} \sum _{{\langle xy \rangle }\in E} c^{\langle xy \rangle }\max _{{i},{j}\in \partial \Sigma } \left| \psi _{{i}{j}}^{x} - \psi _{{i}{j}}^{y} \right| , \end{aligned}$$
(6.18)

where the minimization over \(\psi _{ij}\) is only subjected to the boundary condition (6.13).

We would like to prove that \(p^{\star } = d^{\star } = \sum _{i=1}^n S(A_i)\). By the definition of \(p^{\star }\), we have \(p^{\star } \le \sum _{i=1}^{n}S(A_i)\). Referring to (6.13), (6.16), and (6.17), it suffices to prove that given the constraints

$$\begin{aligned} -\psi _{{i}{j}}^{i} = \psi _{{i}{j}}^{j} = 1,\quad \lambda ^{\langle xy \rangle }\ge \left| \psi _{{i}{j}}^{y} - \psi _{{i}{j}}^{x} \right| \quad \text {for all }i,j\in \partial \Sigma , \end{aligned}$$
(6.19)

we have for all \(\lambda \)

$$\begin{aligned} \sum _{{\langle xy \rangle }\in E} \lambda ^{\langle xy \rangle }c^{{\langle xy \rangle }} \ge \sum _{i=1}^n S(A_i). \end{aligned}$$
(6.20)

This is done in the following steps.

First, let \(C_{ij}\) be any (undirected) path from a vertex i to another vertex j. Utilizing (6.19), we see that for the particular case where \(i,j \in \partial \Sigma \),

$$\begin{aligned} \sum _{{\langle xy \rangle }\in C_{{i}{j}}} \lambda ^{\langle xy \rangle }\ge \sum _{{\langle xy \rangle }\in C_{{i}{j}}} \left| \psi _{{i}{j}}^{y} - \psi _{{i}{j}}^{x} \right| \ge \sum _{{\langle xy \rangle }\in C_{{i}{j}}} \left( \psi _{{i}{j}}^{y} - \psi _{{i}{j}}^{x} \right) = \psi ^{j}_{{i}{j}} - \psi ^{i}_{{i}{j}} = 2. \end{aligned}$$
(6.21)

Now, for any boundary vertex \({i} \in \partial \Sigma \), we define the function \(f_i\) on the set of vertices by

$$\begin{aligned} f_{i}(x)= \inf _{C_{ix}}\sum _{\langle uv\rangle \in C_{ix}} \lambda ^{\langle uv \rangle }. \end{aligned}$$
(6.22)

It immediately follows from (6.21) that for any \({i}, {j}\in \partial \Sigma \),

$$\begin{aligned} f_{i}(x) + f_{j}(x) \ge 2. \end{aligned}$$
(6.23)

Furthermore, we observe that by construction

$$\begin{aligned} f_i(x)&\le f_i(y)+\lambda ^{\langle xy \rangle }, \end{aligned}$$
(6.24)

as the left-hand side chooses the path that minimizes the sum of \(\lambda ^{\langle uv \rangle }\) over paths from i to x, whereas the right-hand side is the sum of \(\lambda ^{\langle uv\rangle }\) over a particular path from i to x. Likewise, we can also interchange x and y to obtain another inequality (note \(\lambda ^{\langle xy \rangle }= \lambda ^{\langle yx\rangle }\) by definition). These two inequalities combine to form the analog of the Hamilton–Jacobi equation from the proof of Theorem 1:

$$\begin{aligned} | f_{i}(x) - f_{i}(y) | \le \lambda ^{\langle xy \rangle }. \end{aligned}$$
(6.25)

Finally, similar to the continuous case, we define \(R_{i} \subset V\) to be the set of vertices for which \(f_{{i}}(x)<1\). Note that given \(i,j \in \partial \Sigma \) with \(i \not =j\), \(R_i\) and \(R_j\) do not overlap. To prove this, suppose there is a vertex \(v \in R_i \cap R_j\). Then

$$\begin{aligned} f_i(v) + f_j(v) < 2, \end{aligned}$$
(6.26)

contradicting (6.23). If we define the function \(g_i:\, V \rightarrow \mathbb {R}\) such that

$$\begin{aligned} g_{{i}}(x) = {\left\{ \begin{array}{ll} g_{{i}}(x) &{} x \in R_{{i}} \\ 1 &{} x \in V {\setminus } R_{{i}} \end{array}\right. }, \end{aligned}$$
(6.27)

it is readily apparent that \(g_i(j) = 1 - \delta _{i,j}\) for all \(j \in \partial \Sigma \). We then claim the following inequality holds for \(g_i\):

$$\begin{aligned} \sum \limits _{{i}=1}^n |g_{{i}}(x) - g_{{i}}(y)| \le \lambda ^{{\langle xy \rangle }}, \quad \forall {\langle xy \rangle }\in E. \end{aligned}$$
(6.28)

If this is indeed true, then we have successfully proven (6.20):

$$\begin{aligned} \sum \limits _{{\langle xy \rangle }\in E} c^{{\langle xy \rangle }}\lambda _{{\langle xy \rangle }} \ge \sum \limits _{{i}=1}^n \sum \limits _{{\langle xy \rangle }\in E} c^{{\langle xy \rangle }} |g_{{i}}(x) - g_{{i}}(y)|\ge \sum \limits _{{i}=1}^n S(A_i), \end{aligned}$$
(6.29)

where the last inequality follows from the same reasoning as the continuous case (with the details again spelled out in Step 3 of Section 3.2 of [22]). Alternately, one can write out the convex optimization program for maximizing the flux out of \(A_i\) and see that the dual program is given by

$$\begin{aligned} \inf _{g}\sum \limits _{{\langle xy \rangle }\in E} c^{{\langle xy \rangle }} |g(x) - g(y)| \quad \text {subject to}\quad g(j) = 1-\delta _{i,j},\, j \in \partial \Sigma . \end{aligned}$$
(6.30)

Thus, it only remains for us to verify (6.28). For any fixed \({\langle xy \rangle }\in E\), consider the following cases. If \(x,y \in R_i\) for some i, then the only term that contributes to the sum in (6.28) is the ith term, and (6.28) is true by (6.25). If \(x \in R_i\) but \(y \notin R_i\), then there are two possibilities. If \(y \in R_j\) for some \(j \in \partial \Sigma {\setminus } \{i\}\), then (6.28) becomes

$$\begin{aligned} (1-f_i(x)) + (1 - f_j(x)) \le \lambda ^{{\langle xy \rangle }}, \end{aligned}$$
(6.31)

which is true by (6.21). If \(y \notin R_j\) for any \(j \in \partial \Sigma \), then (6.28) becomes

$$\begin{aligned} 1 - f_i(x) \le \lambda ^{{\langle xy \rangle }}, \end{aligned}$$
(6.32)

which is true since \(\lambda ^{\langle xy \rangle }+ f_i(x) \ge f_i(y) \ge 1\) by construction. Lastly, if \(x,y \notin R_i\) for any \(i \in \partial \Sigma \), then the left-hand side of (6.28) vanishes and is hence trivially true. \(\quad \square \)

6.3 Network decomposition

We now discuss network decomposition. Again let \(\Sigma = (V,E,c)\) be an undirected network whose boundary \(\partial \Sigma \) consists of n components \(A_1, \ldots , A_n\). The following definitions will be useful.

Definition 5

  1. 1.

    The entropy ray\(R(\Sigma )\) associated with \(\Sigma \) is the ray generated by the entropy vector

    $$\begin{aligned} \left( S(A_1), \dots , S(A_1A_2),\dots ,S(A_1,\dots ,A_n) \right) \ , \end{aligned}$$
    (6.33)

    where each entry \(S(A_{i_1}\cdots A_{i_k})\) is the entropy of the boundary set \(A_{i_1}\cdots A_{i_k}\), i.e. the maximal flux out of \(A_{i_1}\cdots A_{i_k}\).

  2. 2.

    A network (or geometry) \(\Sigma \)realizes a ray R if \(R(\Sigma ) = R\).

Note that if a vector R is realized by a network (geometry), then any positive scalar multiple of R is also realized by the same network (geometry) up to scaling the capacities (metric).

We will sometimes say that a network is of a certain type (Bell pair, perfect tensor, etc.) if it realizes an entropy ray of that particular type of entanglement.

Definition 6

(Subnetworks and subnetwork decomposition). A subnetwork of \(\Sigma = (V,E,c)\) is a network \((V,E,c_1)\) such that \(c_1 \le c\) on all edges. We say \(\Sigma \) decomposes into subnetworks \(\Sigma _{1},\dots ,\Sigma _m\) for \(\Sigma _i = (V,E,c_i)\) if

$$\begin{aligned} \sum _{i=1}^m c_{i}^e = c^e, \quad \forall e \in E. \end{aligned}$$
(6.34)

Two vertices in a network \(\Sigma \) are connected by\(\Sigma \) if there is a path of nonzero capacities between them.

Theorem 4

(Tripartite network decomposition). An arbitrary network \(\Sigma \) with three boundary vertices \(A_1, A_2, A_3\) decomposes into three subnetworks \(\Sigma _{ij}\), \(1 \le i < j \le 3\), such that

$$\begin{aligned} S_{\Sigma _{ij}}(A_i) = S_{\Sigma _{ij}}(A_j) = \frac{1}{2}I(A_i:A_j), \quad \text {and} \quad S_{\Sigma _{ij}}(A_k) = 0, \quad k \ne i,j, \end{aligned}$$
(6.35)

where \(I(A_i:A_j) = S_{\Sigma }(A_i) + S_{\Sigma }(A_j) - S_{\Sigma }(A_i A_j)\) is the mutual information between \(A_i\) and \(A_j\) on \(\Sigma \). In particular, this implies that \(\Sigma _{ij}\) connects only \(A_i\) and \(A_j\), that \(\sum _{i<j} S_{\Sigma _{ij}}(A_k) = S_\Sigma (A_k)\) for every k, and that the \(A_i:A_j\) mutual information computed on \(\Sigma _{ij}\) equals that computed on \(\Sigma \).

Proof

From Theorem 3, there exists a multiflow \(\{v_{ij}\,:\, 1 \le i \ne j \le 3\}\) such that \(v_{ij} = -v_{ji}\) and for each i, \(\sum _{j \ne i} v_{ij}\) is a max flow on \(A_i\). Hence, we have

$$\begin{aligned} \begin{aligned} S(A_1)&= S(A_1; v_{12}) + S(A_1; v_{13}),\\ S(A_2)&= S(A_1; v_{12}) + S(A_2; v_{23}),\\ S(A_3)&= S(A_1; v_{13}) + S(A_2; v_{23}). \end{aligned} \end{aligned}$$
(6.36)

It follows that

$$\begin{aligned} S(A_i; v_{ij}) = \frac{1}{2}I(A_i:A_j),\quad i < j. \end{aligned}$$
(6.37)

Also, by definition, we have

$$\begin{aligned} S(A_k; v_{ij}) = 0, \quad \text {and} \quad S(A_j \; v_{ij}) = -S(A_i ; v_{ij}), \quad k \ne i,j. \end{aligned}$$
(6.38)

For \(i<j\), define the subnetwork \(\Sigma _{ij} = (V,E,c_{ij})\) such that \(c_{ij}^e = |v_{ij}^e|\) for all \(e \in E\). By noting that \(v_{ij}\) can be viewed as a fully saturated flow on \(\Sigma _{ij}\), it is clear that the subnetworks \(\Sigma _{ij}\) satisfy the condition in (6.35). However, the three subnetworks \(\Sigma _{ij}\) do not necessarily add up to \(\Sigma \). There may still be a residual capacity \(c^e - \sum _{i<j} c^e_{ij}\) on each edge e of \(\Sigma \). To solve the issue, we simply append this capacity to one of the \(\Sigma _{ij}\), say \(\Sigma _{12}\). This does not change the maximal fluxes on \(\Sigma _{12}\) due to (6.36). For instance, if \(S_{\Sigma _{12}}(A_1) > I(A_1:A_2)/2\), then

$$\begin{aligned} S(A_1) \ge S_{\Sigma _{12}}(A_1) + S_{\Sigma _{13}}(A_1) > S(A_1), \end{aligned}$$
(6.39)

which is absurd. \(\quad \square \)

Conjecture 1

(Four-partite network decomposition). An arbitrary network \(\Sigma \) with four boundary vertices \(A_1, \ldots , A_4\) decomposes into six pairwise subnetworks \(\Sigma _{ij}\), \(1 \le i<j\le 4\), together with a remainder subnetwork \(\Sigma _r\). The subnetworks obey the following properties:

  1. 1.

    \(\Sigma _{ij}\) connects only \(A_i\) and \(A_j\), and has fluxes \(I(A_i:A_j)/2\) on \(A_i\). In other words,

    $$\begin{aligned} S_{\Sigma _{ij}}(A_i) = S_{\Sigma _{ij}}(A_j) = \frac{1}{2}I(A_i:A_j), \quad \text {and} \quad S_{\Sigma _{ij}}(A_k) = 0, \quad k \ne i,j. \end{aligned}$$
    (6.40)

    The right-hand side condition implies that \(S_{\Sigma _{ij}}(A_i A_j) = 0\), so the \(A_i:A_j\) mutual information computed on \(\Sigma _{ij}\) equals that computed on \(\Sigma \). Moreover, the tripartite information calculated on \(\Sigma _{ij}\) vanishes.

  2. 2.

    \(\Sigma _r\) is a four-partite perfect tensor network with the same tripartite information as \(\Sigma \). In other words, all pairwise mutual informations vanish and

    $$\begin{aligned} S_{\Sigma _r}(A_i) = \frac{1}{2}S_{\Sigma _r}(A_i A_j) = \frac{-\,I_3}{2}, \end{aligned}$$
    (6.41)

    where \(-\,I_3 = -\,I_3(A_1:A_2:A_3)\) is the tripartite information calculated on \(\Sigma \).

Properties 1 and 2 together imply that the subsystem entropies of the seven subnetworks add up to those of \(\Sigma \), i.e. \(\sum _{i<j} S_{\Sigma _{ij}}(s) + S_{\Sigma _r}(s) = S_\Sigma (s)\) for every \(s \subset \{A_i\}\).

Although Theorem 3 is not sufficient to prove Conjecture 1, we have numerical evidence for it in the form of direct computations for some network examples. Furthermore, it is in fact not difficult to state a decomposition conjecture for arbitrary number of boundary vertices.

Conjecture 2

(Arbitrary n network decomposition). A network with \(n\ge 2\) boundary vertices decomposes into subnetworks each of which are realizations of extremal rays of the n-boundary region holographic entropy cone.

Note that the cone for n boundary regions has \(S_n\) permutation symmetry, and we are not modding out by this symmetry in Conjecture 2.

One part of the conjecture is immediate: the set of allowed subnetworks must contain realizations of all the extremal rays. The difficulty comes from the converse, which is showing that the decomposition of an arbitrary network \(\Sigma \) requires no other subnetworks beyond those in Conjecture 2.

For \(n=5\), the holographic entropy cone is known [5]. In this case, a network \(\Sigma \) should decompose into (at most) 10 Bell pair subnetworks, 5 four-partite perfect tensor networks, and 5 five-partite perfect tensor networks. However, beyond five boundary regions, Bell pair and perfect tensor networks will no longer suffice, as it is known that the higher holographic entropy cones have extremal rays that only admit network realizations of nontrivial topology.

6.4 Flow extension lemma

We give another proof that \(-\,I_3\ge 0\) for discrete networks. The result holds not only for undirected networks, but also more generally for directed networks with rational capacities satisfying a property called inner-superbalanced, as defined below. The proof here involves only properties of flows without referring to min-cuts. Thus it is interesting to see if the techniques used here can be generalized to obtain entropy inequalities involving more than four boundary regions [5].

Except in Corollary 1 at the end of this subsection, we consider directed networks \(\Sigma = (V,E,c)\) with boundary \(\partial \Sigma \) where the capacity function c is the constant function that assigns 1 to all edges. Hence, c will often be suppressed from the notation below. Also, we will only consider flows which have values zero or one on all edges. By the Ford-Fulkerson algorithm, a max flow on a boundary subset \(A \subset \partial \Sigma \) can always be taken to have such a property. A flow is essentially a collection of edges such that at each non-boundary vertex the collection contains an equal number of incoming edges and outgoing edges.

Let v be any flow on \(\Sigma \) and \(A \subset \partial \Sigma \) be a boundary subset. Denote by \(A^c\) the complement of A in \(\partial \Sigma \), and set \(n = S_{\Sigma }(A;v)\). It is not hard to show that v contains n edge-disjoint paths from A to \(A^c\). Conversely, an arbitrary collection of n edge-disjoint paths from A to \(A^c\) defines a flow whose flux out of A is n. Consequently, the max flow S(A) is equal to the maximum number of edge-disjoint paths from A to \(A^c\).

Given a flow v on a network \(\Sigma = (V,E)\), the residual network \({\text {Res}}(v;\Sigma )\) is obtained from (VE) by reversing the direction of all edges on which v takes non-zero value. By our convention, each edge in \({\text {Res}}(v;\Sigma )\) still has capacity one.

Lemma 1

Let \(\Sigma , A, \) and v be as above, then \(S_{\Sigma }(A) = S_{{\text {Res}}(v;\Sigma )}(A) + S_{\Sigma }(A;v)\).

Proof

Set \(\tilde{\Sigma } = {\text {Res}}(v;\Sigma )\). For each edge e in \(\Sigma \), denote by \(\tilde{e}\) the corresponding edge in \(\tilde{\Sigma }\). Hence \(\tilde{e}\) has the same direction as e if and only if \(v^e = 0\). Let w be any flow on \(\tilde{\Sigma }\). We define a flow u on \(\Sigma \) by adding w to v, taking the direction of the flows into consideration. Explicitly, set

$$\begin{aligned} u^e&= |v^e - w^{\tilde{e}}|. \end{aligned}$$
(6.42)

That is, there is one unit of flow for u on an edge if and only if either v or w, but not both, occupies that edge. (When they both occupy an edge, the net result adding them cancels each other.) It can be shown that u is a valid flow on \(\Sigma \) and moreover,

$$\begin{aligned} S_{\Sigma }(A;u)&= S_{\tilde{\Sigma }}(A; w) + S_{\Sigma }(A;v). \end{aligned}$$
(6.43)

By taking w to be a max flow on A, we obtain the \(\ge \) direction in the lemma,

$$\begin{aligned} S_{\Sigma }(A)&\ge S_{\tilde{\Sigma }}(A) + S_{\Sigma }(A;v). \end{aligned}$$
(6.44)

To prove the other direction, define \(\tilde{v}\) to be the flow on \(\tilde{\Sigma }\) such that \(\tilde{v}^{\tilde{e}} = v^e\). Then it follows that

$$\begin{aligned} S_{\tilde{\Sigma }}(A;\tilde{v}) = - S_{\Sigma }(A;v) \quad \text {and}\quad {\text {Res}}(\tilde{v};\tilde{\Sigma }) = \Sigma . \end{aligned}$$
(6.45)

Hence, by what we have showed above,

$$\begin{aligned} S_{\tilde{\Sigma }}(A)&\ge S_{\Sigma }(A) -S_{\Sigma }(A;v). \end{aligned}$$
(6.46)

\(\square \)

Definition 7

A flow v on \(\Sigma \) is reachable from a boundary set \(A \subset \partial \Sigma \) if for any edge e with \(v^e \ne 0\), there exists a path contained in v connecting a vertex in A to s(e).

Definition 8

A network \(\Sigma \) is inner-superbalanced if at each non-boundary vertex, the total capacity of incoming edges is no greater than that of outgoing edges.

If v is a flow on an inner-superbalanced network \(\Sigma \), then \({\text {Res}}(f;\Sigma )\) is also inner-superbalanced.

Lemma 2

(Flow extension lemma) Let \(\Sigma \) be an inner-superbalanced graph with a partition \(\partial \Sigma = B_1 \sqcup B_2 \sqcup B_3\) such that \(S(B_1) = 0\).Footnote 16 Then given any flow v reachable from \(B_1\), there exists a flow \(\tilde{v}\) extending v such that \(\tilde{v}-v\) is a max flow on \(B_2\). Furthermore, one can choose \(\tilde{v}\) such that \(\tilde{v}-v\) is a collection of \(S(B_2)\) edge-disjoint paths from \(B_2\) to \(B_2^c\), and in particular, \(\tilde{v}\) is reachable from \(B_1 \sqcup B_2\). See Fig. 3 (Left) for a schematic picture of \(\tilde{v}\).

Fig. 3
figure 3

(Left) A flow configuration of \(\tilde{v}\) in Lemma 2; (Middle) A configuration near s(e) when e is a type (1, 1) edge; (Right) A local picture at \(t(e')\) when there are no outgoing edges with type (0, 0)

Proof

Let w be a max flow on \(B_2\) consisting of \(S(B_2)\) edge-disjoint paths from \(B_2\) to \(B_2^c\). We call an edge e of type (ij), if \(v^e = i, w^e = j\), \(i,j = 0,1\). Let n be the number of type (1, 1) edges. We construct a flow \(\tilde{v}\) satisfying the requirement in the statement of the lemma by induction on n. If \(n=0\), then \(\tilde{v}:= v + w\) is such a flow.

If \(n>0\), pick a path p of w such that e is the first edge of p along its direction that has type (1, 1). Truncate the path p at the vertex s(e), only keeping the first half from its initial vertex to s(e) and dropping the second half from w (so w temporarily becomes an invalid flow). Update the type of these edges in the second half by subtracting (0, 1) from the original types. Denote the first half of p, i.e. the remaining part of p, by \(p'\). Since v is reachable from \(B_1\), there exists a path q of v from \(B_1\) to s(e). Note that s(e) cannot be a vertex of \(B_2\) since otherwise there would be a flow from \(B_1\) to \(B_2\), contradicting to our assumption. Hence, \(p'\) is not empty. See Fig. 3 (Middle). Note that at the moment the number of type (1, 1) edges is at most \(n-1\).

We now proceed in an algorithmic approach.

(ENTER): If \(p'\) is a complete path, i.e. it ends at a boundary vertex, goto(EXIT). Otherwise, continue in the following two exclusive cases.

Case \({{\varvec{I}}}\): there is an outgoing edge of type (0,0) at the end point of\({\varvec{p'}}\). We simply append any outgoing edge of type (0, 0) to \(p'\) and change its type to (0, 1). The augmented path is still denoted by \(p'\), and we always consider \(p'\) as part of w. goto(ENTER).

Case\({{\varvec{II}}}\): there is no outgoing edge of type (0,0) at the end point of\({\varvec{p'}}\). By construction, any edge on \(p'\), and in particular the last edge \(e'\) on \(p'\), is of type (0, 1). Note that at the moment w violates the law of conservation only at the vertex \(t(e')\), where there is one more unit of incoming flows than outgoing flows. Since \(\Sigma \) is inner-superbalanced, at the vertex \(t(e')\), there must be one incoming edge \(e_{11}\) of type (1, 1), one outgoing edge \(e_{01}\) of type (0, 1), and another outgoing edge \(e_{10}\) of type (1, 0), such that \(e_{11}, e_{01}\) are consecutive edges on a path \(p''\) of w different from \(p'\). See Fig. 3 (Right). Now truncate \(p''\) at \(t(e')\) and append the second half of \(p''\) to \(p'\) so that \(p'\) becomes a complete path ending at \(B_2^c\) (still possibly with some type (1, 1) edges). Denote the first half of \(p''\) by r, which contains at least one type (1, 1) edge such as \(e_{11}\). Find the first type (1, 1) edge, say \(e''\), along r. As before, truncate r at \(s(e'')\), throw away the second half of r from w and subtract (0, 1) from the type of each edge on the second half, and set \(p'\) to be the first half of r. Set q to be any path of v connecting \(B_1\) to \(s(e'')\). Now the number of type (1, 1) edges is at most \(n-2\) (which means once we have reduced n to \(n=1\), then Case II cannot not occur). goto(ENTER).

(EXIT):\(p'\) must end at \(B_2^c\) (in fact, \(B_1\)), since otherwise the path q combined with all the edges of \(p'\) picked up in Case I would form a path from \(B_1\) to \(B_2\), contradicting the assumption that \(S(B_1) = 0\). Now w becomes a valid flow and still consists of \(S(B_2)\) edge-disjoint paths, and furthermore, the number of type (1, 1) edges is at most \(n-1\). The induction follows.

Note that the above procedure will always end up in (EXIT) after finitely many steps since the graph is finite and the number of type (1, 1) edges always decrease in Case II. \(\quad \square \)

Let \(\Sigma \) be an inner-superbalanced network with a partition \(\partial \Sigma = A \sqcup B \sqcup C \sqcup D \). Recall the definition of \(-\,I_3\) in Sect. 6.1,

$$\begin{aligned} -\,I_3(A:B:C):= S(AB) + S(AC)+ S(BC) - S(A) - S(B) - S(C) - S(ABC). \end{aligned}$$
(6.47)

We write \(-\,I_3(A:B:C)\) as \(-\,I_3^{\Sigma }(A:B:C)\) when there is more than one network present.

Theorem 5

Let \(\Sigma \) be an inner-superbalanced network with \(\partial \Sigma = A \sqcup B \sqcup C \sqcup D \), then \(-\,I_3(A:B:C) \ge 0\).

Proof

Note that for any flow v on \(\Sigma \), \(S_{\Sigma }(AB;v) = S_{\Sigma }(A;v) + S_{\Sigma }(B;v)\). Combined with Lemma 1, it follows that \(-\,I_3^{\Sigma }(A:B:C) = -\,I_3^{{\text {Res}}(v;\Sigma )}(A:B:C)\) for any flow v. Hence, it suffices to prove nonnegativity of \(-\,I_3\) for any residual network. By the nesting property [16, 22], there exists a flow v which is maximal simultaneously on A, AB, and ABC. Hence by Lemma 1, the residual network of v has max flow equal to zero on A, AB, and ABC. Without loss of generality, we may simply assume \(S(A) = S(AB) = S(ABC) = 0\) for \(\Sigma \), and then \(-\,I_3^{\Sigma } = S(BC) + S(AC) - S(B) - S(C)\).

Let \(B_1 = AB\), \(B_2 = C\), \(B_3 = D\) as in Lemma 2, and let v be a collection of S(B) edge-disjoint paths from B to A. This is possible since \(S(AB)= 0\) and hence any flow starting from B must end at A. Then v is reachable from \(B_1\). By Lemma 2, there exists a flow \(\tilde{v}\) extending v such that \(\tilde{v}-v\) consists of S(C) edge-disjoint paths. Since \(S(ABC) = 0\), these paths will end either at A or at B, but never at D. See Fig. 4. Let \(\tilde{v}_B\) (resp. \(\tilde{v}_A\)) be the subflow of \(\tilde{v}\) consisting of the paths which end at B (resp. A), then \(\tilde{v} = \tilde{v}_A +\tilde{v}_B\), and we have

$$\begin{aligned} S(B) + S(C)&= S_{\Sigma }(B;v) + S_{\Sigma }(C; \tilde{v}-v) \\&= S_{\Sigma }(B;v) - S_{\Sigma }(A; \tilde{v}-v) - S_{\Sigma }(B; \tilde{v}-v)\\&= S_{\Sigma }(BC;\tilde{v}_A) + S_{\Sigma }(AC;\tilde{v}_B)\\&\le S(BC) + S(AC). \end{aligned}$$

\(\square \)

Fig. 4
figure 4

A flow configuration resulting from the application of Lemma 2. Here \(\tilde{v} = v + v_1 + v_2\), \(\tilde{v}_B = v_1\), \(\tilde{v}_A = v + v_2\)

Corollary 1

Let \(\Sigma = (V,E,c)\) be an inner-superbalanced network with a rational capacity function such that \(\partial \Sigma = A \sqcup B \sqcup C \sqcup D \), then \(-\,I_3(A:B:C) \ge 0\). In particular, if \(\Sigma \) is an undirected network with a rational capacity function, then \(-\,I_3(A:B:C) \ge 0\).

Proof

The extension of nonnegativity of \(-\,I_3\) from a constant capacity function to a rational capacity function is straightforward. One simply chooses an appropriate unit so that the rational function becomes integral, and then one splits every edge into several parallel edges, one for each unit capacity of that edge. The new edges all have capacity 1. Apparently, the new network with the constant capacity function has the same max fluxes on any boundary as the original network.

If \(\Sigma \) is an undirected network, then by Sect. 6.1, it can be viewed as a directed network with each edge replaced by a pair of parallel oppositely-oriented edges. Such a network is clearly inner-superbalanced.    \(\square \)

Finally, we would like to point it out that the condition of being inner-superbalanced is necessary for the nonnegativity of \(-\,I_3\). Consider for instance the network as shown in Fig. 5 which has three boundary vertices ABC, and we take D to be empty. Then a straightforward computation shows that \(-\,I_3(A:B:C) = -1\).

Fig. 5
figure 5

A network with \(-\,I_3 < 0\). \(S(A) = S(B) = S(AB) = 1\) and all other maximal fluxes are zero

7 Future Directions

In this paper, our main goal was to prove the holographic entropy inequality MMI using bit threads. We successfully achieved this by proving Theorem 1, which is esstentially the continuum generalization of Theorem 3 from graph theory. As far as we are aware, this is the first result concerning multicommodity flows in the setting of Riemannian manifolds. The proof itself may be of interest for many, as it borrows extensively tools from convex optimization. Indeed, using such tools, we were able to provide a novel proof for the old result Theorem 3 in graph theory as well. We hope that such tools can be fruitfully applied in the future to further our understanding of bit threads in holographic systems.

Our work leaves open several directions for further inquiry. The first is to understand the higher holographic entropy-cone inequalities found in [5] in terms of flows or bit threads. The full set of inequalities is known for five parties and conjectured for six; for more than six parties, only a subset of the inequalities is known. However, none of the inequalities beyond MMI follows from the known properties of flows and multiflows, such as nesting and Theorems 1 and 2. Therefore, flows must obey some additional properties that guarantee those higher inequalities.Footnote 17 Among the inequalities proved so far (subadditivity, strong subadditivity, MMI), each one has required a new property. Clearly this is not very satisfying, and one can hope that there exists a unifying principle governing flows through which the full holographic entropy cone for any number of regions can be understood.

A second set of issues suggested by our work concerns the state decomposition conjecture of Sect. 4. There are really two problems here. Specifically, it would be useful both to sharpen the conjecture by constraining the possible form of the 1 / N corrections and to find evidence for or against the conjecture. The simplest non-trivial case to test is the \(n=3\) case with the regions A and C arranged so that \(I(A:C)=0\) (at leading order in 1 / N). The ansatz (4.2) then simplifies to

$$\begin{aligned} |{\psi }\rangle _{ABC} = |{\psi _1}\rangle _{AB_1}\otimes |{\psi _3}\rangle _{B_3C} \end{aligned}$$
(7.1)

(interpreted suitably, see Sect. 4). Wormhole solutions of the kind studied in [2] might be a particularly useful testing ground for these questions, since the corresponding state can be described in terms of a CFT path integral, potentially giving another handle on its entanglement structure.Footnote 18

A third set of questions concerns a possible geometric decomposition of the bulk. These are motivated by our Theorem 4 and Conjecture 1 in the network case. Theorem 4 states that a network with three boundary vertices admits a decomposition into three subnetworks, effectively realizing the triangle skeleton diagram of Fig. 1. Conjecture 1 makes a similar claim for a network with four boundary vertices. It would be very interesting from the point of view of graph theory to prove or disprove this conjecture. It would also be interesting to define an analogous decomposition in the Riemannian setting. Such a decomposition would imply that, for a given decomposition of the boundary, the bulk could be taken apart into building blocks. These would consist of a bridge connecting each pair of regions \(A_i\), \(A_j\) with capacity \(\frac{1}{2}I(A_i:A_j)\), and in the four-region case a four-way bridge realizing the star graph with capacity \(-\frac{1}{2}I_3\). If such a decomposition of the bulk can be defined and proved to exist, it would mirror the conjectured state decomposition. This would lead to the question of whether these two decompositions are physically related—is the bulk built up out of pieces representing elementary entanglement structures?

Fourth, it would be interesting to explore possible connections between our work and recent conjectures on entanglement of purification in holographic systems [3, 4, 26, 32, 40, 41]. This actually involves two different issues. First, it seems reasonable to suppose that holographic entanglement of purification admits a description in terms of bit threads [11]. Second, one could ask whether the entanglement of purification conjecture has any bearing on our state-decomposition conjecture or vice versa.

Finally, bit threads can be generalized to the covariant setting, where they reproduce the results of the HRT formula [21]. Since the MMI inequality is known to be obeyed by the HRT formula [43], it would be interesting to understand how bit threads enforce MMI in the covariant setting.

We leave all of these explorations to future work.