1 Introduction

In the optimal transport problem of Monge [11] and Kantorovich [3], one is given distributions \(f(x)\) of sources and \(g(y)\) of sinks, and is asked which pairing \(h(x,y) \ge 0\) of sources with sinks minimizes a given transportation cost \(c(x,y)\). The duality theory initiated by Kantorovich provides a key tool for the analysis of this question. It was historically one of the first and has become one of the archetypal examples in the theory of infinite-dimensional linear programming. Note that the optimal transportation problem is concerned only with moving sources to sinks and not with the means by which this might be accomplished.

In the capacity-constrained variant of this problem, a bound \(\bar{h}(x,y)\) is imposed on the number of sources at \(x\) which can be paired with sinks at \(y\). Such a bound is meant to model the constraints on the means of transportation. For example, in a discrete setting, if bread loaves are transported from bakeries at \(i\) to cafés at \(j\) by means of trucks (each dedicated to route \(ij\)), then the capacity bound models each truck’s capacity [6]. From the perspective of the constrained transport problem we sometimes refer to the (unconstrained) optimal transport problem as the case \(\bar{h} =\infty \).

The imposition of such a bound has a qualitative effect on the optimal solution and can lead to surprising conclusions about the topology of its support. For generic \(c \in C^2\), the unconstrained problem leads to solutions which are singular measures concentrated on sets of zero volume, whereas in the constrained problem all solutions are functions of the form \(\bar{h} 1_W\) whose support \(W\) is a Lebesgue set with positive volume. Examples show that the boundary of the support need neither be smooth nor simply connected [7].

Although a duality theorem for capacity-constrained transport was established by LevinFootnote 1 (see Theorem 4.6.14 of [12], for which a new proof is given in [8]), it has not been clear until now whether the dual problem admits solutions. When they exist, such solutions play the role of Lagrange multipliers for the primal problem. However, the compactification techniques used to find them in the unconstrained problem [10, 14] fail miserably when \(\bar{h} \ne +\infty \). The purpose of this manuscript is to prove that, under suitable hypotheses such as the continuity and strict positivity of \(f,g\) and \(\bar{h}\) on their supports, presumed compact in \(\mathbf {R}^m, \mathbf {R}^n\) and \(\mathbf {R}^{m+n}\), Levin’s dual problem in fact admits solutions \((u,v)\in L^1 \oplus L^1\). This allows us to characterize optimizers \(h\) for the primal problem in terms of Lagrange multipliers \((u,v)\). As in the theory for the unconstrained Monge-Kantorovich problem [10, 14] which has developed since the work of Brenier [2], we expect our characterization of primal optimizers using dual solutions will be the starting point for any future analysis of their analytic or geometric properties.

As a consequence of our analysis, we are also able to deduce Kantorovich’s duality theorem for the unconstrained optimal transportation problem as a singular limit \(\bar{h}\rightarrow \infty \).

To fix notation, let \(f\) and \(g\) be probability densities on \(X\subset \mathbf {R}^m\) and \(Y\subset \mathbf {R}^n\), respectively. We assume \(X\) and \(Y\) to be compact, unit volume sets without loss of generality. The continuous functions \(C(X)\) form a Banach space under the supremum norm \(\Vert \,\cdot \,\Vert _\infty \), whose dual space consists of signed Radon measures \(M(X)=C(X)^*\) normed by total variation \(\Vert \,\cdot \,\Vert _{TV}\), according to the Riesz-Markov theorem [13]. Let \(C_+(X) = \{c \in C(X) \mid c \ge 0\}\) and \(M_+(X) = \{ U \in M(X) \mid U \ge 0\}\). We denote the Jordan decomposition of a signed measure into its positive and negative parts by \(U=U_+ - U_-\), and set \(|U| = U_+ + U_-\).

Fix \( 0 \le \bar{h} \in L^\infty (X\times Y)\) and let \(\Gamma :=\Gamma ^{\bar{h}}(f,g)\) denote the set of joint probability densities \(h\le \bar{h}\) with \(f\) and \(g\) as marginals. Kellerer (1) and Levin (2) give necessary and sufficient conditions for \(\Gamma \) to be non-empty: \(\Gamma \not =\emptyset \) if and only if

$$\begin{aligned} \int \varphi f dx + \int \psi g dy \le \iint \left[ \varphi (x) +\psi (y)\right] _+ \bar{h}(x,y) dxdy \end{aligned}$$
(1)

for all \(\varphi ,\psi \in L^\infty \) [4, 5], or equivalently if and only if

$$\begin{aligned} \int _A f dx + \int _B g dy \le 1 + \iint _{A \times B} \bar{h} dx dy \end{aligned}$$
(2)

for all Borel \(A \times B \subset X \times Y\) [9]. Here \(dx = d\mathcal {H}^m(x)\) and \(dy=d\mathcal {H}^n(y)\) denote Lebesgue measure. Assuming these conditions are satisfied, the capacity-constrained transportation problem is to compute

$$\begin{aligned} I^* := \sup _{h \in \Gamma ^{\bar{h}}(f,g)} \iint h(x,y) s(x,y) dx dy, \end{aligned}$$
(3)

where our sign convention is to maximize the surplus \(s:=-c \in L^1(X \times Y)\) rather than to minimize the cost \(c\). This supremum is known to be attained at an extreme point of the feasible set [6], and the extreme points have been characterized in [7] as those \(h \in \Gamma \) given by \(h = \bar{h} 1_W\) for some Lebesgue measurable \(W \subset X \times Y\).

For \(u \in L^1(X)\), \(v \in L^1(Y)\) and \(s \in L^1(X \times Y)\) define

$$\begin{aligned} I(u,v):= \iint \left[ s(x,y) + u(x) + v(y)\right] _+ \bar{h}(x,y) dxdy - \int u f dx - \int v g dy. \end{aligned}$$
(4)

Although Levin’s dual problem is linear rather than convex and was set in \(C(X) \oplus C(Y)\) rather than in \(L^1(X) \oplus L^1(Y)\), it is easily seen to be equivalent to computing the infimum

$$\begin{aligned} I_* := \inf _{(u,v) \in L^1(X) \oplus L^1(Y)} I(u,v), \end{aligned}$$
(5)

by identifying his Lagrange multiplier \(w\) conjugate to the capacity constraint \(h(x,y) \le \bar{h}(x,y)\) with \(w(x,y) = [s(x,y) + u(x) + v(y)]_+\). Levin asserts

$$\begin{aligned} I^*=I_*, \end{aligned}$$
(6)

see also [8]. We would like to know whether the infimum \(I_*\) is attained.

Our main theorem is stated below. It gives a condition on continuous strictly positive \(\bar{h},f,g\) which ensures that the infimum in (5) is attained: namely, the set \(\Gamma ^{\bar{h}/\eta }( f, g)\) of joint probability densities \(h\le \bar{h}/\eta \) with \(f\) and \(g\) as marginals must be non-empty—not only for \(\eta =1\) as required to have the identity \( I_* = I^*\)—but for some \(\eta >1\). We note that the condition that \(\bar{h}/\eta \) is still admissible is a kind of “interior point” condition which often appears in convex analysis. For example, in transport theory, a similar phenomenon, leading to the existence of a dual potential, was observed in [1].

Main Theorem (Theorem 4.2) Let \(f,g\) and \(\bar{h}\) be continuous and strictly positive on the compact, unit-volume sets \(X \subset \mathbf {R}^m,Y\subset \mathbf {R}^n\) and \(X \times Y\) respectively. Fix \(\eta >1\) and \(s \in L^1(X \times Y)\). If \(\Gamma ^{\bar{h}/\eta }(f, g)\) is not empty, then there exist functions \((u,v) \in L^1(X) \oplus L^1(Y)\) such that \(I_*=I(u,v)\).

The functions \(u\) and \(v\) are commonly referred to as dual potentials. Combining this theorem with known results [Proposition 2.1 and Levin’s duality (6)], we obtain a characterization of optimality:

Corollary 1.1

(Characterization of optimality) Under the hypotheses of the main Theorem, any \(h \in \Gamma ^{\bar{h}}(f,g)\) is optimal if and only if there exist \((u,v) \in L^1(X) \oplus L^1(Y)\) such that

$$\begin{aligned} s(x,y) + u(x) + v(y)\;\left\{ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} \le \; 0 &{} \mathrm{where} &{} h=0,\\ =\; 0&{} \mathrm{where} &{} 0<h< \bar{h},\\ \ge \; 0&{} \mathrm{where} &{} h= \bar{h} . \end{array} \right. \end{aligned}$$
(7)

Characterization (7) can be interpreted as a complementary slackness condition.

We now outline our strategy. Proving that the minimum (5) is attained relies on a continuity and compactness argument. At its core is a coercivity estimate (Proposition 3.3) which guarantees \(L^1\)-boundedness of (suitably normalized) minimizing sequences. In view of the lack of compactness of the closed unit ball in the non-reflexive Banach space \(L^1\), it is natural to embed \(L^1\) into the larger space of signed Radon measures equipped with the total variation norm, which, as a dual space, is guaranteed better compactness properties by Alaoglu’s theorem. Although the number of competitors increases, this embedding does not affect our variational problem: It can be easily shown that the value of the functional \(I(u,v)\), suitably extended to singular measures, does not decrease with this extension. Lower semicontinuity properties of \(I(u,v)\) allow us to conclude the proof.

The \(L^1\)-bounds that we derive for minimizing sequences are in a certain sense uniform in \(\bar{h}\). This observation allows us to derive the classical Kantorovich duality from Levin’s duality as the singular limit \(\bar{h}\rightarrow \infty \). In tandem with the results obtained in the companion paper [8], this amounts to a new and elementary proof of the Kantorovich duality theorem.

The remainder of this paper is organized as follows: In Sect. 2, we derive weak duality and complementary slackness conditions; Sect. 3 contains the key (coercivity) estimates; Sect. 4 contains the main result (the appendix contains a few elementary facts used in its proof); in Sect. 5, we give a new elementary proof of the Kantorovich duality; we finally conclude this paper with a discussion on future work in Sect. 6.

2 Complementary slackness

In the following proposition, we establish the complementary slackness conditions of linear programming and obtain the inequality \(I_* \ge I^*\), that is, the easy part of Levin’s duality, as a by-product.

Proposition 2.1

(Complementary slackness) Fix \((u,v) \in L^1(X) \oplus L^1(Y)\), \((s,\bar{h}) \in (L^1\oplus L^\infty )(X \times Y)\) and a probability density \(0 \le h \le \bar{h}\) whose marginals are denoted by \(f\) and \(g\). Then

$$\begin{aligned} I(u,v) \ge \iint h s dxdy. \end{aligned}$$
(8)

Moreover, equality holds if and only if \((u,v)\) and \(h\) satisfy the following complementary slackness conditions

$$\begin{aligned} s + u + v\;\left\{ \begin{array}{l@{\quad }l@{\quad }l} \le \; 0&{}\text{ in } &{} \{ h=0 \},\\ =\; 0&{} \text{ in }&{} \{ 0<h< \bar{h} \},\\ \ge \; 0&{} \text{ in }&{} \{ h= \bar{h} \} , \end{array} \right. \end{aligned}$$
(9)

up to Lebesgue negligible sets.

Clearly, \((u,v)\in L^1(X)\oplus L^1(Y)\) and \(h\in L^{\infty }(X\times Y)\) satisfying (9) are a pair of minimizers and maximizers of (5) and (3), respectively.

Proof

Because \(h\le \bar{h}\), we first notice that

$$\begin{aligned} I(u,v)&\ge \iint [s+u+v]_+ h dxdy- \int ufdx -\int vgdy\\&\ge \iint \left( s+u+v\right) h dxdy- \int ufdx -\int vgdy\\&= \iint h s dxdy \end{aligned}$$

to establish (8). Moreover, these inequalities become equalities if and only if \(s + u+ v\le 0\) where \(\bar{h}> h\) and \(s+u+v\ge 0\) where \(h > 0\). Combining these two conditions implies (9). \(\square \)

3 Coercivity in \(L^1\)

In this section, we establish the estimates which constitute the core of this paper.

We remark that \(I(u+k,v-k) = I(u,v)\) for all \(k\in \mathbf {R}\) shows the level sets of \(I\) cannot be compact. To resolve this lack of coercivity, we shall eventually assume

$$\begin{aligned} \overline{uf} := \int u(x)f(x) dx = \int v(y)g(y)dy =: \overline{vg}, \end{aligned}$$

which can always be enforced by an appropriate choice of constant \(k\), and restricts the problem to a codimension 1 subspace of \(L^1(X) \oplus L^1(Y)\). We would then like to show \(I(u,v) \le I_*+1\) implies an \(L^1\)-bound on \(u\) and \(v\) under suitable assumptions on the data \(f\), \(g\), and \(\bar{h} \).

Our argument comes in two steps: We first prove a bound on the means \(\overline{uf} = \overline{vg}\). We shall use this to establish a bound on the oscillation \(\Vert uf - \overline{uf}\Vert _{L^1}\) of \(uf\). A similar bound for \(vg\) follows by symmetry.

The bound on the means follows immediately from \(I(u,v) \le I_*+1\) by the following claim.

Lemma 3.1

(Mean bound) Fix \((u,v,s) \in L^1(X) \oplus L^1(Y) \oplus L^1(X \times Y)\) and a probability density \(h\in L^\infty (X \times Y)\) with marginals \(f\) and \(g\). If \(\bar{h} \ge \eta h\) for some \(\eta >1\), then

$$\begin{aligned} - I(u,v) \le \overline{uf} + \overline{vg} \le \frac{I(u,v) + \eta \Vert h s\Vert _{L^1}}{\eta -1}. \end{aligned}$$

Proof

One direction is easy: \(I(u,v) \ge -\overline{uf} - \overline{vg}\) follows directly from the definition (4). The other direction is a consequence of

$$\begin{aligned} I(u,v)&\ge - \overline{uf} - \overline{vg} + \eta \iint h(x,y)[s(x,y) + u(x) + v(y) ]dxdy\\&\ge (\eta -1)(\overline{uf}+\overline{vg}) - \eta \Vert h s\Vert _ {L^1} \end{aligned}$$

since \(h\) has \(f\) and \(g\) as its left and right marginals. \(\square \)

We next convert upper bounds on \(f,g\) and \(1/\bar{h}\) into a bound on the oscillation of \(uf\) around its mean. Of course, a symmetrical bound holds for \(vg\).

Lemma 3.2

(Oscillation bound) Fix \(u,f \in L^1(X)\), \(v,g \in L^1(Y)\) and \((s,\bar{h}) \in (L^1 \oplus L^\infty )(X\times Y)\). If \(\bar{h}(x,y) \ge \epsilon f(x)g(y) \) for some \(0<\epsilon \le 1\) and all \((x,y) \in X \times Y\), then

$$\begin{aligned} \frac{\epsilon }{6} \Vert uf - \overline{u f}\Vert _{L^1(X)} \le I(u,v) + \Vert sfg\Vert _{L^1}+ |\overline{uf}| + |\overline{vg}|. \end{aligned}$$
(10)

Proof

Let \(\sigma = \int |uf-\overline{uf}|dx\) denote oscillation around the mean, i.e. the \(L^1\)-distance separating \(uf\) from its average value. Set \(X^\pm = \{x:\, \pm (u(x)f(x)-\overline{uf}) > \sigma /3\}\) with \(|X^+| = \mathcal {H}^m(X^+)\) and \(|X^-| = \mathcal {H}^n(X^-)\). From the definition of \(\overline{uf}\) we find

$$\begin{aligned} \int _{\{ |uf -\overline{uf}| \le \sigma /3\}}\left( \,\overline{uf}- u(x)f(x)\right) dx = A^+ - A^- \end{aligned}$$

where

$$\begin{aligned} A^\pm&= \pm \int _{X^\pm } \left( u(x)f(x)-\overline{uf}\right) dx\\&\ge 0. \end{aligned}$$

On the other hand, from the definition of \(\sigma \),

$$\begin{aligned} \sigma&= \left( \int _{\{|uf - \overline{uf}| \le \sigma /3\}} + \int _{X^+} + \int _{X^-}\right) |u(x)f(x)-\overline{uf}| dx\\&\le (1-|X^+|-|X^-|) \sigma /3 + A^+ + A^-. \end{aligned}$$

Thus

$$\begin{aligned} A^+ + A^- \ge (2 + |X^+| + |X^-|)\sigma /3 \ge \frac{2}{3} \sigma \end{aligned}$$

and

$$\begin{aligned} A^+ - A^- \ge -(1-|X^+| - |X^-|) \frac{\sigma }{3} \ge -\frac{\sigma }{3} \end{aligned}$$

whence \(A^+\ge \sigma /6.\) Thus to control \(\sigma \) it is enough to control \(A^+\).

Now, since \(g\) is a probability density

$$\begin{aligned} A^+&= - |X^+| \overline{uf} + \iint _{X^+\times Y} ufg\, dx dy \\&= \iint _{X^+\times Y} (s+u+v)fg\, dx dy - \iint _{X^+\times Y} sfg\, dx dy\\&- |X^+| \overline{uf} - \overline{vg} \int _{X^+}f\, dx \\&\le \frac{1}{\epsilon } I(u,v) + \Vert sfg\Vert _{L^1} + \left( \frac{1}{\epsilon }-|X^+|\right) \overline{uf} + \left( \frac{1}{\epsilon }-\int _{X^+}f\, dx\right) \overline{vg} \end{aligned}$$

where in the last estimate we have used the assumption that \(\epsilon fg\le \bar{h}\) for some \(0<\epsilon \le 1\) which implies \(\iint _{X^+\times Y} (s+u+v)fg \le \iint _{X^+\times Y} [s+u+v]_+fg \le \frac{1}{\epsilon }\iint _{X\times Y} [s+u+v]_+ \bar{h} = \frac{1}{\epsilon }(I(u,v)+\overline{uf} +\overline{vg})\). Using that \(f\) is a probability density, \(|X^+|\le 1\), and \(\epsilon \le 1\) yields

$$\begin{aligned} \epsilon A^+ \le I(u,v) + \Vert sfg\Vert _{L^1} + \Big [\overline{uf}\Big ]_+ + \Big [\overline{vg}\Big ]_+. \end{aligned}$$

Since \(A^+ \ge \sigma /6\), this yields the desired bound (10), in which we have replaced the positive part \([\cdot ]_+\) by absolute value \(|\cdot |\) simply for ease of parsing. \(\square \)

Proposition 3.3

(Coercivity) Let \(s \in L^1(X\times Y)\) and \(\eta h \le \bar{h} \in L^\infty (X \times Y)\), where \(\eta >1\) and \(h\) is a probability density having \(f\) and \(g\) as its marginals. Assume \(\bar{h}(x,y) \ge \epsilon f(x) g(y)\) and \(\min \{f(x),g(y)\}\ge \epsilon \) for some \(\epsilon >0\) and almost all \((x,y) \in X \times Y\). If \(\overline{uf} = \overline{vg}\) for some \(u\in L^1(X)\) and \(v\in L^1(Y)\), then \(I(u,v) \le I_* +1\) implies \(\Vert u\Vert _{L^1}\) and \(\Vert v\Vert _{L^1}\) are controlled by a bound which depends only on \(I_*\), \(\epsilon \), \(\eta \), \(\Vert s\Vert _{L^1}\) and \(\Vert \bar{h}\Vert _\infty \).

Proof

According to Lemma 3.1, the means \(\overline{vg}=\overline{uf}\) are bounded in terms of the given \(I_*\), \(\epsilon \), \(\eta \), \(\Vert s\Vert _{L^1}\) and \(\Vert \bar{h}\Vert _\infty \). Lemma 3.2 and \(\epsilon fg\le \bar{h}\) then bounds the mean oscillation of \(uf\) around \(\overline{uf}\) in terms of \(\overline{uf}=\overline{vg}\) and the listed parameters. Now \(\Vert u\Vert _{L^1} \le \Vert f^{-1}\Vert _{L^\infty } \Vert uf\Vert _{L^1} \le \epsilon ^{-1}\Vert uf\Vert _{L^1}\) combined with \(\Vert uf\Vert _{L^1} \le \Vert \overline{uf}\Vert _{L^1} + \Vert uf-\overline{uf}\Vert _{L^1}\) yields the desired bound for \(\Vert u\Vert _{L^1}\). A similar bound for \(\Vert v\Vert _{L^1}\) follows by symmetry. \(\square \)

Remark 3.4

The bound mentioned in the conclusion of Proposition 3.3 is uniform in \(\Vert \bar{h}\Vert _{L^{\infty }}\gg 1\).

4 Existence of optimizers

Proposition 3.3 implies that we can choose a minimizing sequences \((u_i,v_i)\) for (5) which is uniformly bounded in \(L^1(X)\oplus L^1(Y)\). Since the \(L^1\) unit ball is non-compact, an \(L^1\) bound is not enough to permit extraction of a limit lying in \(L^1(X)\oplus L^1(Y)\). To recover convergence we would like to view \((u_i,v_i)\) in a larger space which is better behaved.

Therefore, let us extend the definition of \(I(u,v)\) from \(L^1\) to the space of signed measures with finite total variation which, as a dual Banach space, has better compactness properties. Denoting by by \(S \in M(X \times Y)\) the measure with Lebesgue density \(s(x,y)\), this extension is given by

$$\begin{aligned} \tilde{I}(U,V) :=\iint \bar{h} d[S + U \otimes \mathcal {H}^n + \mathcal {H}^m \otimes V]_+ - \int f dU - \int g dV \end{aligned}$$

for \((U,V) \in M(X) \oplus M(Y)\). (We recall that in our notation, \(\mathcal {H}^m\) and \(\mathcal {H}^n\) are Lebesgue measures.) The next lemma is routine; it verifies that \(\tilde{I}\) is lower semicontinuous with respect to weak-\(*\) convergence in \(M(X)\oplus M(Y)\).

Lemma 4.1

(Lower semicontinuity) Let \(s \in L^1(X\times Y)\) induce \(dS = s d\mathcal {H}^{m+n}\). Given continuous non-negative \(f,g\) and \(\bar{h}\) on the compact sets \(X\subset \mathbf {R}^m, Y\subset \mathbf {R}^n\) and \(X \times Y\), the functional \(\tilde{I}(U,V)\) behaves lower semicontinuously with respect to weak-\(*\) convergence in \(M(X)\oplus M(Y)\).

Proof

Choose a bounded sequence in \(M(X) \oplus M(Y)\) which converges \((U_i,V_i) \rightarrow (U,V)\) when tested against functions in \(C(X) \oplus C(Y)\). Then

$$\begin{aligned} \int f dU = \lim _{i \rightarrow \infty } \int f dU_i \quad \mathrm{and} \quad \int g dV = \lim _{i \rightarrow \infty } \int g dV_i, \end{aligned}$$

so we need only show

$$\begin{aligned}&\liminf _{i \rightarrow \infty } \iint \bar{h} d[S + U_i \otimes \mathcal {H}^n + \mathcal {H}^m \otimes V_i]_{+} \nonumber \\&\ge \iint \bar{h} d [S + U\otimes \mathcal {H}^n + \mathcal {H}^m \otimes V]_+. \end{aligned}$$
(11)

We define the signed Radon measures \(d\mu := \bar{h} d\left( S + U \otimes \mathcal {H}^n + \mathcal {H}^m \otimes V\right) \) and \(d\mu ^i := \bar{h} d\left( S + U_i \otimes \mathcal {H}^n + \mathcal {H}^m \otimes V_i\right) \). Then \(\mu ^i\rightarrow \mu \) weakly-\(*\). Indeed, for every \(\varphi \in C(X\times Y)\)

$$\begin{aligned} \iint \varphi d\mu ^i&= \int \bar{h} s\varphi dxdy + \int \left( \int \bar{h}\varphi dy\right) d U^i + \int \left( \int \bar{h}\varphi dx\right) dV^i\\&\rightarrow \int \bar{h} s\varphi dxdy + \int \left( \int \bar{h}\varphi dy\right) d U + \int \left( \int \bar{h}\varphi dx\right) dV\\&= \iint \varphi d\mu , \end{aligned}$$

because the \(x\)- and \(y\)-marginals of \(\bar{h}\varphi \) are both continuous. Moreover, decomposing \(\mu \) and \(\mu ^i\) into their positive and negative parts \(\mu = \mu _+ - \mu _-\) and \(\mu ^i=\mu ^i_+ - \mu ^i_-\), (11) becomes

$$\begin{aligned} \liminf _{i\rightarrow \infty } \mu ^i_+(X\times Y)\ge \mu _+(X\times Y). \end{aligned}$$
(12)

This in fact is an immediate consequence of the weak-\(*\) convergence \(\mu ^i\rightarrow \mu \): On the one hand, choosing \(\varphi \equiv 1\), weak-\(*\) convergence implies that \(\mu ^i_+(X\times Y) - \mu _-^i(X\times Y)=\mu ^i(X\times Y)\rightarrow \mu (X\times Y)=\mu _+(X\times Y) - \mu _-(X\times Y)\). On the other hand, by the lower semicontinuity of the total variation norm, it is \(\mu _+(X\times Y) + \mu _-(X\times Y)=|\mu |(X\times Y)=\Vert \mu \Vert _{TV}\le \liminf _{i\rightarrow \infty } \Vert \mu ^i\Vert _{TV}=\liminf _{i\rightarrow \infty }|\mu ^i|(X\times Y)=\liminf _{i\rightarrow \infty }\left( \mu ^i_+(X\times Y) + \mu ^i_-(X\times Y)\right) \). The statement in (12) now follows upon combining both convergence results. \(\square \)

To prove the next theorem we need to recall some basic facts about signed Borel measures.

A measure \(\mu \) on \(\mathbf {R}^m\) is absolutely continuous with respect to Lebesgue measure \(\mathcal {H}^m\), denoted \(\mu \ll \mathcal {H}^m\), if for any \(\mathcal {H}^m\)-measurable set \(\Omega \) for which \(\mathcal {H}^m (\Omega )=0\), it is also the case that \(\mu (\Omega )=0\). Two signed measures \(\mu _1\) and \(\mu _2\) are mutually singular, denoted \(\mu _1 \perp \mu _2\), if they are concentrated on disjoint sets.

Lebesgue’s decomposition theorem states that a \(\sigma \)-finite measure \(\mu \) can be the uniquely decomposed into its absolutely continuous part and singular part with respect to Lebesgue measure: \(\mu =\mu _{ac} + \mu _{s}\), where \(\mu _{ac} \ll \mathcal {H}^m\) and \(\mu _{s} \perp \mathcal {H}^m\); furthermore \(\mu _{ac} = hd\mathcal {H}^m\) for a unique \(h\in L^1(\mathcal {H}^m)\) called the Radon–Nikodym derivative of \(\mu _{ac}\) (with respect to \(\mathcal {H}^m\)). Note that \(\mu _{s} \perp \mu _{ac}\).

The Hahn–Jordan decomposition theorem states that any signed measure \(\mu \) can be uniquely decomposed into its positive and negative parts: \(\mu =\mu _+ - \mu _-\), where \(\mu _+, \mu _-\) are positive measures and \(\mu _{+} \perp \mu _{-}\). The variation of \(\mu \) is denoted by \(|\mu | :=\mu _+ + \mu _-\).

It is a standard fact that for \(\sigma \)-finite measure \(\mu \) the Hahn–Jordan decomposition and the Lebesgue decomposition commute: \(\mu _+= [\mu _{ac}]_+ + [\mu _{s}]_+\) and \(\mu _-= [\mu _{ac}]_- + [\mu _{s}]_-\).

Theorem 4.2

(Existence of optimal potentials) Let \(f,g\) and \(\bar{h}\) be continuous and strictly positive on the compact, unit-volume sets \(X \subset \mathbf {R}^m,Y\subset \mathbf {R}^n\) and \(X \times Y\) respectively. Fix \(\eta >1\) and \(s \in L^1(X \times Y)\). If \(\Gamma ^{\bar{h}/\eta }(f, g)\) is not empty, then there exist functions \((u,v) \in L^1(X) \oplus L^1(Y)\) such that \(I_*=I(u,v)\) in (5).

Proof

Let \((u_i,v_i)\) be a minimizing sequence for (5): \(u_i \in L^1(X)\) and \(v_i \in L^1(Y)\) such that \(I(u_i,v_i) \rightarrow I_*\). Fix \(\epsilon \le \min \{f(x)^{\pm 1},g(y)^{\pm 1}\}\). Since \(\bar{h}>0\) is bounded away from zero, taking \(\epsilon >0\) smaller if necessary ensures \(\bar{h}(x,y) \ge \epsilon f(x) g(y)\) throughout \(X \times Y\). Adding a constant to \(u_i\) and subtracting the same constant from \(v_i\) ensures \(\overline{u_if} = \overline{v_ig}\). Now Proposition 3.3 asserts a bound for \(\Vert u_i\Vert _{L^1} + \Vert v_i\Vert _{L^1}\) independent of \(i\). Letting \(U_i\) and \(V_i\) denote the measures with Lebesgue densities \(u_i\) and \(v_i\) and recalling \(M(X) = C(X)^*\), Alaoglu’s theorem provides a weak-\(*\) convergent subsequence also denoted \((U_i,V_i)\) with limit \((U_i,V_i) \rightarrow (U,V) \in M(X)\oplus M(Y)\). Since \(\tilde{I}(U_i,V_i) = I(u_i,v_i) \rightarrow I_*\) by construction, Lemma 4.1 implies \(\tilde{I}(U,V)\le I_*\). We will next argue that \(\tilde{I}(U,V)\ge I_*\).

Let \(S = s \mathcal {H}^n\otimes \mathcal {H}^m \). Since the Hahn–Jordan and the Lebesgue decompositions of a measure commute, as mentioned at the end of the paragraph above

$$\begin{aligned} \left[ S + U\otimes \mathcal {H}^n + \mathcal {H}^m\otimes V\right] _{+}&= \left[ S + U_{ac}\otimes \mathcal {H}^n + \mathcal {H}^m\otimes V_{ac} + U_{s}\otimes \mathcal {H}^n + \mathcal {H}^m\otimes V_{s}\right] _+\nonumber \\&= \left[ S \!+\! U_{ac}\otimes \mathcal {H}^n + \mathcal {H}^m\otimes V_{ac} \right] _+ \!+\! \left[ U_{s}\otimes \mathcal {H}^n +\mathcal {H}^m\otimes V_{s}\right] _+.\nonumber \\ \end{aligned}$$
(13)

Therefore

$$\begin{aligned} \tilde{I}(U,V)&= \tilde{I}(U_{ac},V_{ac}) + \iint \bar{h}\, d\left[ U_{s}\otimes \mathcal {H}^n +\mathcal {H}^m \otimes V_{s}\right] _+ - \int f\, dU_{s} - \int g\, dV_{s} \\&\ge I(u,v) + \iint h\, d\left[ U_{s}\otimes \mathcal {H}^n + \mathcal {H}^m \otimes V_{s}\right] - \int f\, dU_{s} - \int g\, dV_{s}\\&= I(u,v), \end{aligned}$$

where \(u\) and \(v\) are the Radon-Nikodym derivatives of \(U_{ac}\) and \(V_{ac}\) respectively: \(u\, dx = dU_{ac}\) and \(v\, dy= dV_{ac}\). Since \(I(u,v) \ge I_*\) we have \(I_* = \tilde{I}(U,V) = I(u,v)\) as desired. \(\square \)

5 From Levin’s to Kantorovich’s duality

In this section, we briefly discuss how to derive the classical Kantorovich duality of optimal transport from Levin’s duality using some of the insights of previous sections. For simplicity, assume that \(f\) and \(g\) are strictly positive, continuous probability densities on the compact sets \(X\subset \mathbf {R}^m\) and \(Y\subset \mathbf {R}^n\), where both \(X\) and \(Y\) have unit volume. Let \(\Gamma (f,g)\) denote the set of all joint probability measures with \(f\) and \(g\) as marginals. Fix \(s \in C(X \times Y)\) continuous and define

$$\begin{aligned} L_s:=\left\{ (u,v)\in L^1(X)\oplus L^1(Y)|\, s(x,y) + u(x)+ v(y) \le 0 \right\} . \end{aligned}$$

We will recover the expression of Kantorovich duality as stated in [14, Theorem 5.10]. We remark that although the elements of \(L_s\) need not be continuous, for Lipschitz continuous costs it is well-known that the infimum on the right hand side of (14) below is attained by Lipschitz continuous densities \((u,v)\).

Theorem 5.1

(Duality for the unconstrained problem) Let \(X\subset \mathbf {R}^m\) and \(Y\subset \mathbf {R}^n\) be compact unit volume sets equipped with positive continuous probability densities \(f \in C(X)\) and \(g \in C(Y)\). If \(s \in C(X \times Y)\) then

$$\begin{aligned} \sup _{H\in \Gamma (f,g)} \iint s(x,y) d H(x,y) = \inf _{(u,v)\in L_s} \left( - \int f(x) u(x)dx - \int g(y) v(y)dy\right) , \end{aligned}$$
(14)

and both infimum and supremum (14) are attained.

Proof

Let \(I^*(\infty )\) denote the left hand side of (14) and \(I_*(\infty )\) denote the right hand side of (14). Note that by a density argument, it is enough to let the \(\sup \) in \(I^*(\infty )\) range over absolutely continuous measures \(d H = hdxdy\in \Gamma (f,g)\).

The inequality \(I^*(\infty ) \le I_*(\infty )\) follows easily from the following inequality which holds for all \((u,v) \in L_s\) and any density \(h\) such that \( hdxdy\in \Gamma (f,g)\):

$$\begin{aligned} \iint s h dxdy&= \iint h(s+u+v)dxdy - \int f udx - \int g vdy\\&\le - \int f u dx- \int g vdy. \end{aligned}$$

The last inequality follows from the definition of \(L_s\). Taking \(\sup \) on the left hand side and \(\inf \) on the right hand side gives the desired inequality.

Observe that to prove the other inequality, \(I^*(\infty ) \ge I_*(\infty )\), it is enough to prove existence of \(H\in \Gamma (f,g)\) and \((u,v)\in L_s\) satisfying

$$\begin{aligned} \iint s\,d H\ge - \int f udx - \int g vdy. \end{aligned}$$

Let \(h(x,y):=f(x)g(y)\) and \(K:=\max _{X \times Y}h(x,y)+1\), and note that \(hdxdy\in \Gamma (f,g)\). Fix \(\eta >1\) such that \(K \ge \eta \, \max _{X \times Y}h(x,y)\), so that \(k \ge \eta h\) for all \(k\ge K\). Note also that \(\Gamma ^{\bar{h}_k}(f,g)\not =\emptyset \) if \(\bar{h}_k=k1_{X\times Y}\).

Defining \(I_k(u,v)\), \(I_*(k)\), and \(I^*(k)\) similarly to \(I(u,v)\), \(I_*\), and \(I^*\) but with \(\bar{h}\) replaced by \(\bar{h}_k\), we deduce from Levin’s duality \(I^*(k)=I_*(k)\) the existence of functions \(h_k\in \Gamma ^{\bar{h}_k}(f,g)\) and \((u_k,v_k)\in L^1(X)\oplus L^1(Y)\) optimizing \(I^*(k)\) and \(I_*(k)\) respectively. Levin’s duality reads

$$\begin{aligned} \iint h_k\,s dxdy=I_k(u_k,v_k) \end{aligned}$$
(15)

for any \(k\ge K\). Since \(I_k(u+\alpha , v-\alpha )=I_k(u,v)\) for \(\alpha \in \mathbf {R}\), there is no loss of generality to assume that

$$\begin{aligned} \overline{u_k f}=\overline{v_k g}. \end{aligned}$$

By a standard argument the sequence \(\{h_k\}_{k\ge K}\) is seen to be weak-\(*\) precompact in the space of probability measures and every limit point \(H\) satisfies \(H\in \Gamma (f,g)\).

In particular, since \(s\) is continuous, upon extracting a subsequence, passing to the limit \(k\rightarrow \infty \) in (15) yields that

$$\begin{aligned} I^*(\infty )=\iint s\,dH= \lim _{k\rightarrow \infty } I_k(u_k,v_k). \end{aligned}$$
(16)

We next observe that the \(L^1\)-bound on the sequences \(\{u_k\}_{k\ge K}\) and \(\{v_k\}_{k\ge K}\) obtained from Lemmas 3.1 and 3.2 is independent of \(k\ge K\). In this context, we state for further references

$$\begin{aligned} \overline{u_kf} + \overline{v_kg} \le \frac{I^*(\infty )+\eta ||s||_{L^1} ||h||_\infty }{\eta -1}, \end{aligned}$$
(17)

as a consequence Lemma 3.1.

Choosing \(\epsilon >0\) so that \(K \ge \epsilon f(x)g(y)\) and \(\min \{f(x),g(y)\} \ge \epsilon \), Proposition 3.3 (see Remark 3.4) implies that \(||u_k||_{L^1}\), and \(||v_k||_{L^1}\) are controlled by a bound which depends only on \(\epsilon , \eta , I^*(\infty ),||s||_{L^1}\), and \(||h||_\infty \), all of which are independent of \(k\).

Arguing as in the proof of Theorem 4.2, we may thus extract a weak-\(*\) convergent subsequence (not relabeled) \((u_k,v_k)\rightarrow (U,V)\in M(X) \oplus M(Y)\) and obtain

$$\begin{aligned} \lim _{k\rightarrow \infty } I_k(u_k,v_k)\ge \lim _{k\rightarrow \infty } \left( -\int u_k fdx - \int v_k gdy\right) = - \int fdU -\int gdV, \end{aligned}$$

by the continuity of \(f\) and \(g\). In particular, we deduce from (15) and (16) that

$$\begin{aligned} \iint s\, dH\ge - \int fdU -\int gdV. \end{aligned}$$
(18)

Moreover, the definition of \(\bar{h}_k\) and Lemma 4.1 imply that

$$\begin{aligned} \iint d[S + U\otimes \mathcal {H}^n+ \mathcal {H}^m\otimes V]_{+}&\le \liminf _{k\rightarrow \infty } \iint [s(x,y) + u_k(x)+v_k(y)]_+dxdy\\&= \liminf _{k\rightarrow \infty } \frac{1}{k}\left( I_k(u_k, v_k) + \overline{u_k f} + \overline{v_k g}\right) . \end{aligned}$$

Since the limit on the right hand side is zero, a consequence of (16) and (17), it follows that \( [S + U\otimes \mathcal {H}^n+ \mathcal {H}^m\otimes V]_+=0\). Now (13) implies that \(\left[ S + U_{ac}\otimes \mathcal {H}^n + \mathcal {H}^m\otimes V_{ac} \right] _+\) and \( \left[ U_{s}\otimes \mathcal {H}^n +\mathcal {H}^m\otimes V_{s}\right] _+\) are both zero. In particular, \(\left[ U_{s} \right] _+\) and \(\left[ V_{s} \right] _+\) are both zero, and with \(u := dU_{ac}/dx\) and \(v := dV_{ac}/dy\) denoting Radon–Nikodym derivatives, we have \([s+u+v]_+=0\), or equivalently \(s+u+v\in L_s\). It follows that

$$\begin{aligned} - \int fdU -\int gdV&= - \int fudx -\int gvdy - \int fdU_s -\int gdV_s \\&\ge - \int fudx -\int gvdy. \end{aligned}$$

Hence (18) becomes \(\iint s\, dH\ge - \int f udx -\int g vdy\) which implies the desired inequality \(I^*(\infty ) \ge I_*(\infty )\). We have thus established (14) and existence of optimizers. \(\square \)

6 Perspectives for future work

Of considerable interest to us is the question of showing some regularity for the minimizing potentials \((u,v)\)—perhaps under stronger restrictions on \((f,g,\bar{h})\) and \(s\). For example, are they continuous or differentiable; might \(u\) and \(v\) belong to some Hölder or Sobolev space or—as in the unconstrained version \(\bar{h}=+\infty \) of the problem [10, 14]—inherit Lipschitz and semiconcavity properties from the cost \(c=-s\)? In view of the characterization (9) for \(h \in \Gamma ^{\bar{h}}(f,g)\) to maximize the expected value of \(s\), this is closely related to smoothness for the free boundary of the set \(W\) such that the optimizer \(h= \bar{h} 1_W\). If \(m=n\), this set is known to be unique (up to sets of \(\mathcal {H}^{2n}\) measure zero) near points where \(0\ne \det [\partial ^2 s/\partial x^i\partial y^j]\) [6]. Simple examples show the boundary of \(W\) can have isolated singularities [6, 7], but is it a smooth hypersurface otherwise? Does \(W\) even have finite perimeter? Where the derivative of \(s(x,y)+u(x)+v(y)\) is non-vanishing, such questions are related to smoothness of \(u\) and \(v\) by the implicit function theorem.