1 Introduction

The present article proposes a simple model of efficient spatial trade in an exchange economy subject to transportation costs. The notion of efficiency we consider comprises two quite different aspects: efficiency of the transport (import/export) process and efficiency in the Pareto sense of the distribution of goods after import/export has taken place. We take as primitives the spatial (initial) distribution of goods, the transportation cost for these goods and the agent’s preferences, given by a (possibly location dependent) utility function. The unknown is the (final) distribution of goods after trade among the different locations. Our approach is based on the maximization of a global criterion which is a weighted average of the utility of final consumption (Pareto efficiency) net of the minimal total transport cost taking into account mass conservation constraints (optimal transport).

Our model relies to a large extent on optimal transport theory which has been the object of an intensive and fruitful stream of research in the last three decades and has found numerous applications in PDEs, functional inequalities, probability theory, economics and more recently machine learning, see the textbooks of Villani [1] and Santambrogio [2], and for an overview of optimal transport methods in economics and econometrics, see Galichon [3]. Optimal transport aims at explaining how to transport mass efficiently (i.e. so as to minimize a total cost) between two given measures with the same total mass. A cornerstone of this theory is the Kantorovich duality which characterizes optimal plans in terms of a pair of dual variables which are naturally interpreted as prices.

In a spatially distributed exchange economy where various goods are available in different quantities according to location, agents located at different locations may have an interest to trade and they will certainly do so if they can all increase their utility for a small transport cost. Possible utility improvements is then the reason why some mass is actually transported. To fix ideas, think of two locations A and B, all the resources in water are concentrated at A and all the resources in food are concentrated at B, agents located at B will then import some water from A and export some food to B but the precise amount of water/food transported will depend on: (i) the initial endowments of A and B, (ii) the preferences of agents located at A and B and (iii) on how costly transporting these goods from one location to another is. If we fix the demand for water of B and the demand for food of A, this is a simple optimal transport problem but these demands result from a Pareto (weighted utility maximization) problem between these two locations. A dual version of the problem consists in looking for prices for which markets clear and also reflect the cost of transporting the goods as efficiently as possible.

We propose a simple variational model that combines optimal transport (that is how to transport the goods) with Pareto efficiency requirements (that is why some goods are actually transported). The variational problem we consider to find Pareto efficient distributions involves separable optimal transport terms and a joint concave utility term. Interestingly, this has a somehow similar structure as steps of the celebrated JKO scheme of Jordan, Kinderlehrer and Otto [4] for Wasserstein gradient flows, see Ambrosio, Gigli and Savaré [5] for a detailed account of this theory. Convex duality also enables us to derive supporting prices and to prove a sort of spatial second welfare theorem. From a mathematical point, the existence of integrable Pareto optimal final resource distributions is non trivial and we provide a complete proof using quite general monotonicity properties. We also prove existence of continuous prices supporting these Pareto distribution of the goods. The fact that the analysis resorts on convex minimization and duality enables us to use state of the art solvers based on the Sinkhorn algorithm for entropic optimal transport, see Cuturi [6], Benamou et al. [7], Cuturi and Peyré [8], Peyré [9]. We illustrate the practical use of our model by solving numerically one-dimensional examples as well as a graph model representing some cities in France.

The paper is organized as follows. Our variational spatial Pareto model and our assumptions are introduced in Sect. 2. Section 3 is concerned with existence and duality results. Primal-dual optimality conditions are further analysed and a second welfare theorem is derived from the latter in Sect. 4. Finally, Sect. 5 presents an algorithm for the entropic regularization of the problem and numerical simulations for equilibrium prices and quantities in one-dimensional domains and a graph representing a network of French cities.

2 Assumptions and Notations

We consider a region denoted by X which we assume to be a compact metric space. We consider agents populating X having initial endowments of N goods, given by nonnegative Borel measures \(\left( \mu _1, \ldots , \mu _N \right) \in {{{\mathcal {M}}}}_+(X)^N\). Choosing a reference measure \(m\in {{{\mathcal {M}}}}_+(X)\) such that \(\mu _i\) is absolutely continuous with respect to m for \(i=1, \ldots , N\) (e.g. \(m=\sum _{i=1}^N \mu _i\)) we denote by \(\alpha _i\) the density of \(\mu _i\) with respect to m i.e.

$$\begin{aligned}\mu _i =\alpha _i m, \, \alpha _i \in L^1(X,m).\end{aligned}$$

Assumption A1

A1 For each good i, there is a transport cost \(c_i \in C(X\times X)\) which satisfies

$$\begin{aligned} c_i(x,y) \ge 0, \; c_i(x, x)=0, \forall (x,y)\in X^2. \end{aligned}$$

The quantity \(c_i(x,y)\) represents the cost of transporting one unit of mass of good i from x to y.

The image (or pushforward) of a Borel measure \(\theta \) by a Borel map T will be denoted \(T_\#\theta \) in the sequel. More precisely, if Y and Z are compact metric spaces, if \(\theta \in {{{\mathcal {M}}}}(Y)\) (i.e. \(\theta \) is a Borel measure on Y) and if T : \(Y \rightarrow Z\) is Borel, then \(T_\#\theta \) is defined by \(T_\# \theta (B)= \theta (T^{-1}(B))\) for every Borel subset B of Z. Equivalently, using test-functions, \(T_\#\theta \) is defined by

$$\begin{aligned}\int _Z \varphi \hbox {d} T_\# \theta =\int _Y \varphi \circ T \hbox {d} \theta , \; \forall \varphi \in C(Z).\end{aligned}$$

In particular, if \(\gamma \in {{{\mathcal {M}}}}_+(X\times X)\), the marginals of \(\gamma \) are \({\mathop {\textrm{proj}}\nolimits _1}_\# \gamma \) and \({\mathop {\textrm{proj}}\nolimits _{2}}_\# \gamma \), where \((\mathop {\textrm{proj}}\nolimits _1(x,y), \mathop {\textrm{proj}}\nolimits _2(x,y))=(x,y)\) denote the canonical projections. Given \(\nu _i\) a Borel measure on X, we set

$$\begin{aligned}T_{c_i}(\mu _i, \nu _i):=\{\begin{array}{ll} \min _{\gamma _i\in \Pi (\mu _i, \nu _i)} \int _{X\times X} c_i(x,y) \hbox {d} \gamma _i(x,y) &{}\quad \hbox {if}\;\nu _i \ge 0\; \hbox {and}\; \nu _i(X)=\mu _i(X) \\ + \infty &{}\quad \hbox {otherwise }\end{array}\end{aligned}$$

where \( \Pi (\mu _i, \nu _i)\) is the set of transport plans between \(\mu _i\) and \(\nu _i\) i.e. the set of nonnegative Borel measures on \(X\times X\) having \(\mu _i\) and \(\nu _i\) as marginals (so that \(\gamma _i(A\times B)\) represents the amount of mass of good i initially in A transported to a location in B). In our model, transporting goods is costly but may be worth because it may result in some improvement in terms of the agent’s preferences. These preferences are given by a (space-dependent) utility function U: \(X\times {\mathbb {R}}_+^N \rightarrow {\mathbb {R}}\cup \{-\infty \}\), on which we make the following assumptions:

Assumption A2

A2 For m-a.e. \(x\in X\), \(\beta \in {\mathbb {R}}_+^N \mapsto U(x, \beta )\) is usc, concave and nondecreasingFootnote 1,

Assumption A3

A3 For every \(\beta \in {\mathbb {R}}_+^N\), \(x\in X \mapsto U(x, \beta )\) is m-measurable.

Assumption A4

A4 The map \((x, \beta ) \mapsto U(x,\beta )\) is sublinear with respect to \(\beta \) uniformly in \(x\in X\), meaning that for every \(\delta >0\) there exists \(C_\delta \) such that for m-a.e. \(x\in X\) one has

$$\begin{aligned} U(x, \beta ) \le \delta \sum _{i=1}^N \beta _i +C_\delta , \; \forall \beta \in {\mathbb {R}}_+^N. \end{aligned}$$

Assumption A5

A5 There exists \(({{\overline{\alpha }}}_1, \ldots , {{\overline{\alpha }}}_N)\in L^1(X,m)^N\) such that

$$\begin{aligned} {{\overline{\alpha }}}_i \ge 0, \; \int _X {{\overline{\alpha }}}_i \hbox {d} m= \int _X \alpha _i \hbox {d} m, \; i=1, \ldots , N, \end{aligned}$$
(2.1)

and

$$\begin{aligned} \int _X U(x, {{\overline{\alpha }}}_1(x), \ldots , {{\overline{\alpha }}}_N(x)) \hbox {d} m(x) >-\infty . \end{aligned}$$

A typical utility which satisfies the above assumptions (as well as the stronger version (A5’) of (A5) introduced in Sect. 3.3) is the Cobb–Douglas utility:

$$\begin{aligned} U(x,\beta ) = w(x) \prod \limits _{i = 1}^{N} \beta _i^{a_i} \end{aligned}$$

with w, a positive measurable and bounded weight function and \(a_i > 0\) and \(\sum _{i=1}^N a_i < 1\).

Now, Given \(\nu := (\nu _1, \ldots , \nu _N) \in {{{\mathcal {M}}}}_+(X)^N\), let us define

$$\begin{aligned}{{{\mathcal {U}}}}(\nu ):={\left\{ \begin{array}{ll} \int _X U(x, \beta _1(x), \ldots , \beta _N(x)) \hbox {d}m(x) \hbox { if} \nu _i=\beta _i m \hbox {, for} i=1, \ldots , m \\ infty \hbox { otherwise} \end{array}\right. }\end{aligned}$$

and

$$\begin{aligned}{{{\mathcal {T}}}}_{\mu }(\nu ):=\sum _{i=1}^N T_{c_i}(\mu _i, \nu _i).\end{aligned}$$

We are here interested in allocations which maximize the average utility (Pareto efficiency) taking into account transportation cost of the different goods and therefore consider

$$\begin{aligned} \sup \Big \{ {{{\mathcal {U}}}}(\nu )-{{{\mathcal {T}}}}_{\mu }(\nu ), \; \nu =(\nu _1, \ldots \nu _N)\in {{{\mathcal {M}}}}_+(X)^N \Big \} \end{aligned}$$
(2.2)

which can also be expressed as

$$\begin{aligned} \sup _{\beta \in L^1(X,m)^N} \left\{ \int _X U(x, \beta (x)) \hbox {d}m(x) -\sum _{i=1}^N T_{c_i}(\alpha _i m, \beta _i m)\right\} \end{aligned}$$
(2.3)

which incorporates the mass conservation constraints

$$\begin{aligned} \int _X \beta _i(x) \hbox {d} m(x)= \int _X \alpha _i (x) \hbox {d} m(x), \; i=1, \ldots , N. \end{aligned}$$
(2.4)

Note that (A5) and (A4) ensure that the value of the maximization problem (2.2) is finite.

3 Existence and Duality

3.1 Existence of \(L^1\) Optimizers

Notice that the existence of an optimizer in \(L^1\) for (2.2) is not totally straighforward, and this is the object of the next result.

Proposition 3.1

Assuming (A1)–(A5), the maximization problem (2.2) admits at least one solution. If, in addition, there exists \(\Omega \subset X\), Borel, such that \(m(\Omega ) > 0\) and U(x, .) is strictly concave on \({\mathbb {R}}_+^N\) for every \(x \in \Omega \) then the solution is unique.

Proof

Let \(\nu ^n =\beta ^n m\) be a maximizing sequence for (2.2). Since \(\beta _i^n\ge 0\) and \(\int _X \beta _i^n \hbox {d} m=\mu _i(X)\), \((\beta ^n)_n\) is bounded in \(L^1(X,m)^N\). It therefore follows from Komlos’ Theorem [10], that there exists a (not relabeled) subsequence such that the Cesaro means

$$\begin{aligned}\frac{1}{n} \sum _{k=1}^n \beta ^k\end{aligned}$$

converges m-a.e. to some \(\beta =(\beta _1, \ldots , \beta _N)\). Since both \({{{\mathcal {U}}}}\) and \(-{{{\mathcal {T}}}}_{\mu }\) are concave, the above sequence of Cesaro means is also a maximizing sequence, hence, slightly abusing notations by calling again \(\beta ^n\) this new sequence we may assume that

$$\begin{aligned} \beta _i^n(x) \rightarrow \beta _i(x) \; \hbox { for }m-\hbox {a.e.} x\in X, \hbox {for } i=1, \ldots , N. \end{aligned}$$
(3.1)

Let \(\delta >0\), the sublinearity assumption (A4), together with (3.1), the semicontinuity of U(x, .) and Fatou’s Lemma yield

$$\begin{aligned}\begin{aligned}&\liminf _n \int _{X} (\delta \sum _{i=1}^N \beta _i^n(x) -U(x, \beta ^n(x)))\hbox {d}m(x) \\&= \delta \sum _{i=1}^N \mu _i(X)- \limsup _n \int _X U(x, \beta ^n(x)))\hbox {d}m(x)\\&d\ge \int _{X} (\delta \sum _{i=1}^N \beta _i(x)-U(x, \beta (x)) \hbox {d}m(x) \end{aligned}\end{aligned}$$

by letting \(\delta \rightarrow 0^+\), we thus get

$$\begin{aligned} \limsup _n \int _X U(x, \beta ^n(x)) \hbox {d} m(x) \le \int _X U(x, \beta (x)) \hbox {d} m(x). \end{aligned}$$
(3.2)

Moreover, \(\nu ^n= \beta ^n m\) being bounded in \({{{\mathcal {M}}}}_+(X)^N\), we can assume (again passing to a subsequence if necessary) that it converges weakly-\(*\) to some \(\nu =(\nu _1, \ldots , \nu _N)\in {{{\mathcal {M}}}}_+(X)^N\)

$$\begin{aligned} \nu _i^n:=\beta _i^n m {\mathop {\rightharpoonup }\limits ^{*}}\nu _i, \; i=1, \ldots , N. \end{aligned}$$
(3.3)

Notice that \(\nu _i(X)=\mu _i(X)\), and that \(T_{c_i}(\mu _i, .)\) is sequentially weakly-\(*\) lsc (as a consequence of the well-known Kantorovich duality formula for optimal transport, see [2]) so that

$$\begin{aligned} \liminf _n {{{\mathcal {T}}}}_\mu (\nu ^n) \ge {{{\mathcal {T}}}}_\mu (\nu ). \end{aligned}$$
(3.4)

From (3.2)–(3.4) and the fact that \(\nu ^n\) is a maximizing sequence, we deduce:

$$\begin{aligned} \sup (2.2)=\int _X U(x, \beta (x)) \hbox {d}m(x)-{{{\mathcal {T}}}}_\mu (\nu ). \end{aligned}$$
(3.5)

Since \(\beta \) and \(\nu \) may be different and \(\beta \) may violate the mass preservation constraints (2.4), the identity (3.5) is not enough to conclude that (2.2) has a solution. To obtain a solution of (2.2), we shall use some monotonicity arguments relying on the assumptions (A1) and (A2).

Let \(f \in C(X)\) with \(f\ge 0\). For all \(M>0\), we have,

$$\begin{aligned}\int _X f(x) \hbox {d} \nu _i^n(x) \ge \int _X f(x) \min (\beta _i^n(x), M) \hbox {d} m(x).\end{aligned}$$

Letting \(n\rightarrow +\infty \), using (3.1), Lebesgue’s dominated convergence Theorem and (3.3), we deduce

$$\begin{aligned}\int _X f(x) \hbox {d} \nu _i(x) \ge \int _X f(x) \min (\beta _i(x), M) \hbox {d} m(x),\end{aligned}$$

so that, letting \(M\rightarrow \infty \), by monotone convergence, we obtain

$$\begin{aligned} \nu _i \ge \beta _i m, \; \hbox { for}\ i=1, \ldots , N. \end{aligned}$$
(3.6)

Thanks to the Radon-Nikodym Theorem, we can decompose each measure \(\nu _i\) as

$$\begin{aligned} \nu _i=\beta _i^a m +\nu _i^s \hbox { with } \beta _i^a \in L^1(X, m) \hbox { and } \; \nu _i^s \perp m. \end{aligned}$$
(3.7)

Since \(\nu _i^s\) and m are mutually singular there exists a Borel subset of X, which we denote \(A_i\), such that \(m(X\setminus A_i)=\nu _i^s(A_i)=0\) so that for every Borel subset B of X, there holds

$$\begin{aligned} (\beta _i^a m)(B)=\nu _i (B \cap A_i), \; \nu _i^s (B)=\nu _i(B \setminus A_i). \end{aligned}$$
(3.8)

As a consequence, with (3.6), we also have

$$\begin{aligned} \beta _i \le \beta _i^a, \; \hbox {} m \hbox {-a.e. for} i=1, \ldots , N. \end{aligned}$$
(3.9)

Let now \(\gamma _i \in \Pi (\mu _i, \nu _i)\) be such that \({{{\mathcal {T}}}}_{c_i}(\mu _i, \nu _i)=\int _{X\times X} c_i \hbox {d} \gamma _i\) and let us decompose \(\gamma _i\) as \(\gamma _i=\gamma _i^a+\gamma _i^s\) where \(\gamma _i^a\) and \(\gamma _i^s\) are defined by

$$\begin{aligned}\int _{X\times X} \varphi \hbox {d} \gamma _i^a := \int _{X\times A_i} \varphi \hbox {d} \gamma _i, \; \int _{X\times X} \varphi \hbox {d} \gamma _i^s := \int _{X\times (X\setminus A_i)} \varphi \hbox {d} \gamma _i,\end{aligned}$$

for every \(\varphi \in C(X\times X)\). By construction, we have

$$\begin{aligned}{\mathop {\textrm{proj}}\nolimits _2}_\# \gamma _i^a= \beta _i^a m, \; {\mathop {\textrm{proj}}\nolimits _2}_\# \gamma _i^s= \nu _i^s\end{aligned}$$

and one can decompose \( \mu _i=\alpha _i m={\mathop {\textrm{proj}}\nolimits _1}_\# \gamma _i\) as \({\mathop {\textrm{proj}}\nolimits _1}_\# \gamma _i^a + {\mathop {\textrm{proj}}\nolimits _1}_\# \gamma _i^s\). Since these two terms are absolutely continuous with respect to m, we can write them as

$$\begin{aligned} \alpha _i^a m:={\mathop {\textrm{proj}}\nolimits _1}_\# \gamma _i^a, \; \alpha _i^s m:= {\mathop {\textrm{proj}}\nolimits _1}_\# \gamma _i^s \hbox { (hence }\alpha _i=\alpha _i^a +\alpha _i^s, m-\hbox {a.e)}.\end{aligned}$$

Now let us define

$$\begin{aligned} {\widetilde{\gamma }}_i:=\gamma _i^a + ({{\,\textrm{id}\,}}, {{\,\textrm{id}\,}})_\# \alpha _i^s m \end{aligned}$$
(3.10)

and observe that

$$\begin{aligned} {\mathop {\textrm{proj}}\nolimits _1}_\# {\widetilde{\gamma }}_i=\alpha _i^a m +\alpha _i^s m=\alpha _i m=\mu _i\end{aligned}$$

and

$$\begin{aligned}{\mathop {\textrm{proj}}\nolimits _2}_\# {\widetilde{\gamma }}_i={\widetilde{\beta }}_i m \hbox { where } {\widetilde{\beta }}_i :=\beta _i^a +\alpha _i^s\end{aligned}$$

so that \({\widetilde{\gamma }}_i \in \Pi (\mu _i, {\widetilde{\beta }}_i m)\). Using (A1) and the definition of \({\widetilde{\gamma }}_i\) in (3.10), we have

$$\begin{aligned}\begin{aligned} T_{c_i}(\mu _i, {\widetilde{\beta }}_i m)\le & {} \int _{X\times X} c_i \hbox {d} {\widetilde{\gamma }}_i=\int _{X\times X} c_i \hbox {d} \gamma _i^a \\\le & {} \int _{X\times X} c_i \hbox {d} \gamma _i =T_{c_i}(\mu _i, \nu _i). \end{aligned}\end{aligned}$$

Setting \({\widetilde{\beta }}=({\widetilde{\beta }}_1, \ldots , {\widetilde{\beta }}_N)\) we thus have

$$\begin{aligned}{{{\mathcal {T}}}}_\mu ({\widetilde{\beta }}m) \le {{{\mathcal {T}}}}_{\mu }(\nu )\end{aligned}$$

and since \({\widetilde{\beta }}_i \ge \beta _i^a\), from (3.9), we deduce that \({\widetilde{\beta }}\ge \beta \), hence from the monotonicity part in assumption (A2), we get

$$\begin{aligned}{{{\mathcal {U}}}}({\widetilde{\beta }}m) \ge {{{\mathcal {U}}}}(\beta m).\end{aligned}$$

Together with (3.5), this implies that \({\widetilde{\beta }}\) solves (2.2). Of course, this solution is unique as soon as \({\mathcal {U}}\) is strictly concave since \({\mathcal {T}}_{\mu }\) is convex. \(\square \)

Remark 3.2

The proof above shows that if \(\nu \in {{{\mathcal {M}}}}_+(X)^N\) is such that \(\nu _i(X) =\mu _i(X)\) for \(i=1, \ldots , N\), and writing the Lebesgue decomposition of \(\nu _i\) with respect to m as

$$\begin{aligned}\nu _i=\nu _i^a +\nu _i^s=\beta _i^a m + \nu _i^s \hbox { with } \nu _i^s \perp m,\end{aligned}$$

there exists \({\widetilde{\beta }}_i \in L^1(X, m)\) such that

$$\begin{aligned}{\widetilde{\beta }}_i \ge \beta _i^a, \; \int _X {\widetilde{\beta }}_i^a \hbox {d} m= \mu _i(X) \hbox { and } T_{c_i}(\mu _i, {\widetilde{\beta }}_i m) \le T_{c_i}(\mu _i, \nu _i).\end{aligned}$$

This, together with the monotonicity of U, shows that

$$\begin{aligned} {{{\mathcal {U}}}}(\beta ^a m) -{{{\mathcal {T}}}}_\mu (\nu ) \le {{{\mathcal {U}}}}({\widetilde{\beta }}m)- {{{\mathcal {T}}}}_\mu ({\widetilde{\beta }}m) \end{aligned}$$

therefore we have

$$\begin{aligned} \sup _{\beta \in L^1((X,m))^N} \left\{ {{{\mathcal {U}}}}(\beta m)-{{{\mathcal {T}}}}_{\mu }(\beta m)\right\} = \sup _{\nu \in {{{\mathcal {M}}}}_+(X)^N} \left\{ {{{\mathcal {U}}}}(\nu ^a)-{{{\mathcal {T}}}}_{\mu }(\nu )\right\} \end{aligned}$$
(3.11)

where in the right-hand side \(\nu ^a=(\nu _1^a, \ldots , \nu _N^a) \) denotes the absolutely continuous part of \(\nu \) with respect to m.

3.2 Duality

The Pareto problem (2.2) appears naturally as the dual of a convex minimization problem over continuous functions which we now describe. Let us first introduce for \((x, \varphi )=(x, \varphi _1, \ldots , \varphi _N)\in X\times {\mathbb {R}}^N\):

$$\begin{aligned} V(x, \varphi ):=\sup _{\beta \in {\mathbb {R}}_+^N} \{U(x, \beta )-\sum _{i=1}^N \beta _i \varphi _i\}. \end{aligned}$$
(3.12)

Note that each V(x, .) is convex, lsc, nonincreasing and that \(\varphi \in {\mathbb {R}}_+^N\) whenever \(V(x,\varphi )<+\infty \). Let us then consider the convex integral functional

$$\begin{aligned}{{{\mathcal {V}}}}(\varphi ):=\int _X V(x, \varphi _1(x), \ldots , \varphi _N(x))\hbox {d}m(x), \; \forall \varphi \in C(X)^N,\end{aligned}$$

and observe that, thanks to (A4), one has

$$\begin{aligned} \min _{i=1, \ldots , N} \min _{x\in X} \varphi _i(x)=\delta >0 \Rightarrow {{{\mathcal {V}}}}(\varphi ) \le C_\delta m(X) <+\infty . \end{aligned}$$
(3.13)

Recall that, for any \(i=1, \ldots , N\) and given \(\psi \in C(X)\), the \(c_i\)-transform of \(\psi \), denoted by \(\psi ^{c_i}\), is defined by

$$\begin{aligned} \psi ^{c_i}(x):=\min _{y\in X} \{c_i(x,y)-\psi (y)\}. \end{aligned}$$
(3.14)

For \(\varphi :=(\varphi _1, \ldots , \varphi _N)\in C(X)^N\), let us set

$$\begin{aligned}K(\varphi ):=-\sum _{i=1}^N \int _X \varphi _i^{c_i} \hbox {d} \mu _i= \sum _{i=1}^N \int _X \max _{y\in X} \{\varphi _i(y)-c_i(x,y)\} \hbox {d} \mu _i (x) \end{aligned}$$

so that K is convex and everywhere continuous w.r.t the uniform norm. Finally, let us consider the convex minimization problem

$$\begin{aligned} \inf _{\varphi \in C(X, {\mathbb {R}})^N} \left\{ K(\varphi ) + {{{\mathcal {V}}}}(\varphi )\right\} . \end{aligned}$$
(3.15)

It is easy to deduce, from (A5) and the fact that \(c_i\) is bounded, that \(K + {{{\mathcal {V}}}}\) is bounded from below. Hence, (3.13) together with the continuity of K implies that the value of (3.15) is finite and that the Fenchel-Rockafellar Theorem applies to (3.15). Identifying the dual of \(C(X)^N\) with \({{{\mathcal {M}}}}(X)^N\), the Fenchel-Rockafellar dual of (3.15) reads

$$\begin{aligned} \sup _{\nu \in {{{\mathcal {M}}}}(X)^N} \left\{ -{{{\mathcal {V}}}}^*(-\nu )-K^*(\nu )\right\} , \end{aligned}$$
(3.16)

where \(K^*\) and \({{{\mathcal {V}}}}^*\) respectively stand for the Legendre transform of K and \({{{\mathcal {V}}}}\) (for the duality between continuous functions and Radon measures), i.e. for every \(\nu \in {{{\mathcal {M}}}}(X)^N\):

$$\begin{aligned} K^*(\nu ):&=\sup _{\varphi \in C(X)^N } \left\{ \sum _{i=1}^N \int _X \varphi _i \hbox {d} \nu _i -K(\varphi )\right\} , \; \\ {{{\mathcal {V}}}}^*(-\nu )&=\sup _{\varphi \in C(X)^N} \left\{ -\sum _{i=1}^N \int _X \varphi _i \hbox {d} \nu _i -{{{\mathcal {V}}}}(\varphi )\right\} . \end{aligned}$$

It turns out that (3.16) is nothing but (2.2), slightly in disguise, as expressed by the next result.

Lemma 3.3

Let \(\nu \in {{{\mathcal {M}}}}(X)^N\), there holds

$$\begin{aligned} K^*(\nu )= {{{\mathcal {T}}}}_\mu (\nu ), \; {{{\mathcal {V}}}}^*(-\nu )={\left\{ \begin{array}{ll} -{{{\mathcal {U}}}}(\nu ^{a}) \hbox { if }\nu \in {{{\mathcal {M}}}}_+(X)^N, \\ +\infty \hbox { otherwise } \end{array}\right. } \end{aligned}$$
(3.17)

where \(\nu ^a\) is the absolutely continuous part of \(\nu \) with respect to m. Moreover, we have

$$\begin{aligned} \inf (3.15)=\max (2.2). \end{aligned}$$
(3.18)

Proof

The fact that \(K^*(\nu )= {{{\mathcal {T}}}}_\mu (\nu )\) is a consequence of the well-known Kantorovich duality formula for optimal transport (see for instance [2]). As for the Legendre transform of \({{{\mathcal {V}}}}\), it follows from the seminal results of Rockafellar [11, 12] for convex integral functionals. More precisely, thanks to Theorem 4 in [12] and our assumptions on U, we have:

$$\begin{aligned}\begin{aligned} {{{\mathcal {V}}}}^*(-\nu )&=\inf _{\theta \in L^1(X,m)^N} \Big \{\int _X V^*(x, -\theta (x)) \hbox {d}m(x)\\&\quad +\sup _{\varphi \in C(X)^N \; : \; {{{\mathcal {V}}}}(\varphi )<+\infty } \sum _{i=1}^N \int _X \varphi _i \hbox {d} (\theta _i -\nu _i)\Big \}. \end{aligned}\end{aligned}$$

On the one hand, by the Fenchel-Moreau Theorem and the convexity and lower semicontinuity of \(-U(x,.)\) we have

$$\begin{aligned}V^*(x,-\theta )=\sup _{\varphi \in {\mathbb {R}}^N} \{ -\theta \cdot \varphi -V(x, \varphi )\}={\left\{ \begin{array}{ll} -U(x, \theta ) \hbox { if }\theta \in {\mathbb {R}}_+^N,\\ +\infty \hbox { otherwise.}\end{array}\right. }\end{aligned}$$

On the other hand, since \({{{\mathcal {V}}}}(\varphi )<+\infty \) whenever each \(\varphi _i\) is strictly positive and \({{{\mathcal {V}}}}(\varphi )<+\infty \) implies that each \(\varphi _i\) is nonnegative, we have

$$\begin{aligned}\sup _{\varphi \in C(X)^N \; : \; {{{\mathcal {V}}}}(\varphi )<+\infty } \sum _{i=1}^N \int _X \varphi _i \hbox {d} (\theta _i -\nu _i)={\left\{ \begin{array}{ll} 0 \hbox { if} \theta _i \le \nu _i \hbox {, for} i=1, \ldots , N \\ +\infty \hbox { otherwise.} \end{array}\right. }\end{aligned}$$

As a consequence, using the monotonicity of U(x, .) and the fact that for \(\theta _i \in L^1(X,m)\), \(0 \le \theta _i \le \nu _i\) is equivalent to \(0\le \theta _i \le \nu _i^a\), we get

$$\begin{aligned}\begin{aligned} {{{\mathcal {V}}}}^*(-\nu )&=\inf \left\{ -\int _X U(x, \theta (x)) \hbox {d}m(x) \; : \; \theta \in L^1(X,m), \; 0\le \theta _i \le \nu _i, \; i=1, \ldots , N\right\} \\&={\left\{ \begin{array}{ll} -{{{\mathcal {U}}}}(\nu ^{a}) \hbox { if}\ \nu \in {{{\mathcal {M}}}}_+(X)^N \\ +\infty \hbox { otherwise.} \end{array}\right. } \end{aligned}\end{aligned}$$

By the Fenchel-Rockafellar Theorem (see [13]), together with the continuity of K and the fact that \({{{\mathcal {V}}}}\) is lsc and \({{{\mathcal {V}}}}(\varphi )\) is finite whenever the components of \(\varphi \) are bounded from below by a positive constant, we get

$$\begin{aligned} \inf (3.15)=\max _{\nu \in {{{\mathcal {M}}}}_+(X)^N}\{ {{{\mathcal {U}}}}(\nu ^a) -{{{\mathcal {T}}}}_\mu (\nu )\}\end{aligned}$$

from which we deduce (3.18) thanks to proposition 3.1 and (3.11). \(\square \)

3.3 Existence for the Primal

Let us consider the following qualification condition which is a strengthening of (A5):

Assumption A5’ There exists \({{\overline{\alpha }}}\in L^1(X,m)^N\) such that (2.1) holds and there exists \(\varepsilon \in (0,1)\) such that

$$\begin{aligned} \int _X U(x, (1-\varepsilon ) {{\overline{\alpha }}}_1(x), \ldots , (1-\varepsilon ) {{\overline{\alpha }}}_N(x)) \hbox {d} m(x) >-\infty . \end{aligned}$$

We then have an existence result for (3.15):

Proposition 3.4

Assuming (A1)–(A5’), the minimization problem (3.15) admits at least one solution.

Proof

Let \(\varphi ^n=(\varphi _1^n, \ldots , \varphi _N^n)\) be a minimizing sequence for (3.15), let us then define \({\widetilde{\varphi }}^n:=({\widetilde{\varphi }}_1^n, \ldots , {\widetilde{\varphi }}_N^n)\) by

$$\begin{aligned} {\widetilde{\varphi }}_i^n(y):=\min _{x\in X}\{ c_i(x,y)-({\varphi _i^n})^{c_i}(x)\}, \; \forall y\in X. \end{aligned}$$
(3.19)

It is well-known that \(({\widetilde{\varphi }}_i^n)^{c_i}= ({\varphi _i^n})^{c_i}\) and that \({\widetilde{\varphi }}_i^n \ge \varphi _i^n\) so that \(K({\widetilde{\varphi }}^n)=K(\varphi ^n)\) and (since V(x, .) is nonincreasing) \({{{\mathcal {V}}}}({\widetilde{\varphi }}^n)\le {{{\mathcal {V}}}}(\varphi ^n)\). Hence \({\widetilde{\varphi }}^n\) is also a minimizing sequence for (3.15). Now the advantage of working with the minimizing sequence \({\widetilde{\varphi }}_i^n\) is that it is uniformly equicontinuous (indeed, thanks to (3.19), a modulus of continuity of \(c_i\) is also a modulus of \({\widetilde{\varphi }}_i^n\) for every n). Since \({{{\mathcal {V}}}}({\widetilde{\varphi }}^n)\) is finite, \({\widetilde{\varphi }}_i^n\ge 0\), let us then set \(\delta _i^n:=\min _X {\widetilde{\varphi }}_i^n \ge 0\), and let us prove that \(\delta _i^n\) is bounded. First, from the equicontinuity, there is a constant \(C\ge 0\) such that for every \(i\in \{1, \ldots , N\}\) and every n, one has

$$\begin{aligned} {\widetilde{\varphi }}_i^n \le \delta _i^n +C. \end{aligned}$$
(3.20)

Now fix, M such that

$$\begin{aligned} M \ge K({\widetilde{\varphi }}_n)+ {{{\mathcal {V}}}}({\widetilde{\varphi }}_n), \; \forall n. \end{aligned}$$
(3.21)

Since \( {\widetilde{\varphi }}_i^n \ge \delta _i^n\), thanks to (A1) we have \(K({\widetilde{\varphi }}_n) \ge \sum _{i=1}^N (\delta _i^n -\min _{X\times X} c_i) \mu _i(X)= \sum _{i=1}^N \delta _i^n \mu _i(X)\). Moreover, by the very definition of V and (3.20)

$$\begin{aligned}\begin{aligned} V(x, {\widetilde{\varphi }}^n(x))&\ge U(x, (1-\varepsilon ){{\overline{\alpha }}}(x)) -(1-\varepsilon )\sum _{i=1}^N {{\overline{\alpha }}}_i(x) {\widetilde{\varphi }}_i^n(x)\\&\ge U(x, (1-\varepsilon ){{\overline{\alpha }}}(x)) -(1-\varepsilon ) \sum _{i=1}^N {{\overline{\alpha }}}_i(x) (\delta _i^n+C) . \end{aligned}\end{aligned}$$

Integrating and using (A5’), this yields that for some constant \(C'\), \({{{\mathcal {V}}}}({\widetilde{\varphi }}^n) \ge C'-(1-\varepsilon ) \sum _{i=1}^N \delta _i^n \mu _i(X)\). Thanks to (3.21), we therefore get

$$\begin{aligned}M \ge C'+ \varepsilon \sum _{i=1}^N \delta _i^n \mu _i(X)\end{aligned}$$

which shows that \(\delta _i^n\) is bounded so that \({\widetilde{\varphi }}^n\) is uniformly bounded. Applying Ascoli’s Theorem, we may therefore assume that \({\widetilde{\varphi }}^n\) converges uniformly to some \({\widetilde{\varphi }}\), so that \(K({\widetilde{\varphi }}^n)\) converges to \(K({\widetilde{\varphi }})\), and since for some constant \(C''\), one has \(V(x, {\widetilde{\varphi }}_n(x)) \ge U(x, {{\overline{\alpha }}}(x))-C''\), one deduces from (A5) and Fatou’s lemma that \(\liminf _n {{{\mathcal {V}}}}({\widetilde{\varphi }}^n) \ge {{{\mathcal {V}}}}({\widetilde{\varphi }})\) which finally proves that \({\widetilde{\varphi }}\) solves (3.15). \(\square \)

4 Optimality Conditions and a Second Welfare Theorem

4.1 Primal-Dual Extremality Conditions

Now that we know that, under the assumptions of proposition 3.4, there exist solutions both to (2.2) and its primal (3.15), we can easily use (3.18) to deduce a characterization of these solutions. Indeed, by construction, we have

$$\begin{aligned} V(x, \varphi ) \ge U(x, \beta )-\varphi \cdot \beta = U(x, \beta )-\sum _{i=1}^N\varphi _i \beta _i, \; \forall (\varphi , \beta )\in {\mathbb {R}}_+^N \times {\mathbb {R}}_+^N \end{aligned}$$
(4.1)

and for every i, \(\beta :=(\beta _1,\ldots ,\beta _N) \in L^1(X, m)^N\) such that \(\beta _i \ge 0\) and \(\int _X \beta _i m =\mu _i(X)\) and every \(\varphi := (\varphi _1,\ldots ,\varphi _N) \in C(X)^N\), we have

$$\begin{aligned} T_{c_i}(\mu _i, \beta _i m) \ge \int _X \varphi _i^{c_i} \alpha _i m+ \int _X \varphi _i \beta _i m. \end{aligned}$$
(4.2)

Now if \(\varphi \) solves (3.15) and \(\beta \) solves (2.2), from (3.18), we obtain

$$\begin{aligned}{{{\mathcal {V}}}}(\varphi )-{{{\mathcal {U}}}}(\beta m)+\int _X \varphi \cdot \beta m + {{{\mathcal {T}}}}_\mu (\beta m) - \sum _{i=1}^N \int _X \varphi _i^{c_i} \alpha _i m-\int _X \varphi \cdot \beta m=0\end{aligned}$$

so that with (4.1), we get

$$\begin{aligned} V(x, \varphi (x)) = U(x, \beta (x))-\varphi (x) \cdot \beta (x)\hbox { for }m-\hbox { a.e. }x. \end{aligned}$$
(4.3)

Likewise, with (4.2), we get for every i,

$$\begin{aligned}T_{c_i}(\mu _i, \beta _i m) = \int _X \varphi _i^{c_i} \alpha _i m+ \int _X \varphi _i \beta _i m\end{aligned}$$

so that

$$\begin{aligned} \exists \gamma _i\in \Pi (\alpha _im, \beta _i m) \hbox { such that } \varphi _i^{c_i}(x)+\varphi _i(y)=c_i(x,y) \hbox { for }\gamma _i-\hbox {a.e.} (x,y).\nonumber \\ \end{aligned}$$
(4.4)

By continuity of \(\varphi _i\) and \(c_i\), this means that, whenever (xy) is in the support of \(\gamma _i\), one has

$$\begin{aligned} \varphi _i(y)-c_i(x,y) \ge \varphi _i(z) -c_i(x,z), \; \forall z\in X. \end{aligned}$$
(4.5)

Note in particular that (4.3) implies that for m-a.e. x, \(\varphi (x)\) is a supergradient of U(x, .) at \(\beta (x)\) and then

$$\begin{aligned} \forall \beta \in {\mathbb {R}}_+^N, \; \varphi (x) \cdot \beta \le \varphi (x) \cdot \beta (x) \Rightarrow U(x, \beta ) \le U(x, \beta (x)). \end{aligned}$$
(4.6)

4.2 A Second Welfare Theorem with Transport Costs

The optimality conditions (4.4)–(4.6) can easily be interpreted in terms of equilibrium conditions as we shall now explain. Imagine that there is a representative agent at each location x, with utility U(x, .) and an initial endowment \(\alpha (x)\) for the goods \(1, \ldots , N\). Assume also that this agent has a given (monetary) endowment w(x). In an exchange economy, where the agent located at y can trade with agents located at any location x, an equilibrium for the monetary endowment w is a system of prices \(y\mapsto p(y)\) together with a final endowment \(y\mapsto \beta (y)\) of the goods, satisfying:

  • There is free-mobility of trade Given the price, system p, and a location x, agents located at x exporting one unit of good i to location y, get a profit \(p_i(y)-c_i(x,y)\), so that their total profit is

    $$\begin{aligned} \pi (x) \cdot \alpha (x)=\sum _{i=1}^N \pi _i(x) \alpha _i(x), \; \pi _i(x)=\max _{y\in Y} \{p_i(y)-c_i(x,y)\}=-p_i^{c_i}(x),\end{aligned}$$
  • Consumers maximize utility Agents located at y maximize their utility \(U(y, \beta )\) subject to their budget constraint that expenditure \(p(y)\cdot \beta \) is smaller than their total revenue which is w(y) augmented by their export profit \(\pi (y) \cdot \alpha (y)\), i.e. \(\beta (y)\) should solve

    $$\begin{aligned}\max \{ U(y, \beta ) \; : \; p(y) \cdot \beta \le \pi (y)\cdot \alpha (y)+w(y)\}\end{aligned}$$
  • Markets clear which means that for every i, one can find a plan \(\gamma _i\) (\(\gamma _i(A\times B)\) represents the quantity of good i transported from a location in A to a location in B), such that \(\gamma _i \in \Pi (\alpha _i m, \beta _i m)\) (i.e. demand and supply match) and \(\pi _i(x)=p_i(y)-c_i(x,y)\) for every (xy) in the support of \(\gamma _i\) (which means that this plan is consistent with free-mobility of trade).

We then straighforwardly deduce from the optimality conditions (4.4)–(4.6), a sort of second welfare theorem with transport costs

Theorem 4.1

Let \(\varphi \) and \(\beta \) solves (3.15) and (2.2) respectively. Define then

$$\begin{aligned}p_i:=\varphi _i, \pi _i:= -p_i^{c_i}, \; w:= p\cdot \beta -\pi \cdot \alpha \end{aligned}$$

then \((\beta , p)\) is an equilibrium with monetary endowment w.

5 Algorithm and Simulations

In this final section, we describe the entropic approximation of (3.15) and its dual (2.2). Since the influential paper of Cuturi [6], entropic optimal transport has become a popular and efficient tool for computational optimal transport thanks to Sinkhorn’s celebrated scaling algorithm, and we refer to [8] for a comprehensive exposition. The algorithm detailed in the next paragraph is based on a variant of the Sinkhorn algorithm introduced by Peyré [9] in the context of Wasserstein gradient flows. We finally give numerical results in the one-dimensional case and in the case of a network of cities.

5.1 An Entropic Approximation Algorithm

From now on, we assume that X is finite, take the counting measure m as reference measure on X, and denote by \(\alpha _i(x)>0\) the inital endowment of location \(x\in X\) in the good \(i\in \{1, \ldots , N\}\). Problem (3.15) thus takes the form:

$$\begin{aligned} \inf _{\varphi \in {\mathbb {R}}^{X\times N}} \sum _{y\in X} V(y, \varphi (y)) + \sum _{i=1}^N \sum _{x\in X} \alpha _i(x) \max _{y\in X} \{\varphi _i(y)-c_i(x,y)\}. \end{aligned}$$
(5.1)

Given a regularization parameter \(\varepsilon >0\), we consider the smooth approximation of (5.1) where maxima are replaced by soft maximaFootnote 2:

$$\begin{aligned} \inf _{\varphi \in {\mathbb {R}}^{X\times N}} \sum _{y\in X} V(y, \varphi (y)) + \varepsilon \sum _{i=1}^N \sum _{x\in X} \alpha _i(x) \log \Big (\sum _ {y\in X} e^{\frac{\varphi _i(y)-c_i(x,y)}{\varepsilon }}\Big ). \end{aligned}$$
(5.2)

Now, let us observe that (5.2) can be conveniently reformulated by considering the convex function

$$\begin{aligned}\Phi (\varphi , \psi ):=\sum _{y\in X} V(y, \varphi (y))- \sum _{i=1}^N \sum _{x\in X} \alpha _i(x) \psi _i(x) + \varepsilon \sum _{i=1}^N \sum _{(x,y)\in X^2} e^{\frac{\psi _i(x)+\varphi _i(y)-c_i(x,y)}{\varepsilon }} \end{aligned}$$

for \((\psi , \varphi ) \in {\mathbb {R}}^{X\times N} \times {\mathbb {R}}^{X\times N}\). Indeed, for fixed \(\varphi \), the minimizer of \(\psi \mapsto \Phi (\varphi , \psi )\) is explicitly given by

$$\begin{aligned} \psi _i(x)=\varepsilon \log (\alpha _i(x))-\varepsilon \log \Big ( \sum _ {y\in X} e^{\frac{\varphi _i(y)-c_i(x,y)}{\varepsilon }} \Big ) \end{aligned}$$
(5.3)

so replacing in \(\Phi \), we get

$$\begin{aligned}\inf _{\psi } \Phi (\varphi , \psi )=C+\sum _{y\in X} V(y, \varphi (y)) + \varepsilon \sum _{i=1}^N \sum _{x\in X} \alpha _i(x) \log \Big (\sum _ {y\in X} e^{\frac{\varphi _i(y)-c_i(x,y)}{\varepsilon }}\Big )\end{aligned}$$

where C is the constant

$$\begin{aligned}C:=\varepsilon \sum _{i=1}^N \sum _{x\in X} \alpha _i(x) (1-\log (\alpha _i(x))).\end{aligned}$$

Optimality system in primal and dual form The optimality condition for (5.2) (which is the minimization of the sum of a convex lsc function with a smooth convex function) writes:

$$\begin{aligned} -\beta (y) \in \partial V(y, \varphi (y)), \; \forall y\in X \end{aligned}$$
(5.4)

where \(\beta (y)=(\beta _1(y), \ldots , \beta _N(y))\) is given by

$$\begin{aligned} \beta _i(y)=\sum _{x\in X} \alpha _i(x) \frac{ e^{\frac{\varphi _i(y)-c_{i}(x,y)}{\varepsilon }}}{ \sum _{z\in X} e^{\frac{\varphi _i(z)-c_i(x,z)}{\varepsilon } } }. \end{aligned}$$
(5.5)

Defining

$$\begin{aligned}\gamma _i(x,y):= \alpha _i(x) \frac{ e^{\frac{\varphi _i(y)-c_i(x,y)}{\varepsilon }}}{ \sum _{z\in X} e^{\frac{\varphi _i(z)-c_i(x,z)}{\varepsilon } } }\end{aligned}$$

we thus have \(\gamma _i \in \Pi (\alpha _i, \beta _i)\) and \(\gamma _i\) solves the entropic optimal transport problem

$$\begin{aligned}T_{c_i}^\varepsilon (\alpha _i, \beta _i):=\inf _{\gamma \in \Pi (\alpha _i, \beta _i)} \sum _{(x,y) \in X^2} (c_i(x,y)+ \varepsilon \log (\gamma (x,y))\gamma (x,y).\end{aligned}$$

It is then easy to check that (5.4)–(5.5) is equivalent to the fact that \(\beta \) solves the dual of (5.2):

$$\begin{aligned} \sup _{\beta \in {\mathbb {R}}_+^{X\times N}} \sum _{y\in X} U(y, \beta (y)) -\sum _{i=1}^N T_{c_i}^\varepsilon (\alpha _i, \beta _i), \end{aligned}$$
(5.6)

which is the entropic regularization of (2.2) with regularization parameter \(\varepsilon \).

Coordinate descent/Sinkhorn We can find the (regularized) prices \(\varphi \) by solving (5.2) and then the optimal quantities \(\beta \) by using (5.5). As noted above, (5.2) is equivalent to

$$\begin{aligned} \inf _{(\psi , \varphi ) \in {\mathbb {R}}^{X\times N} \times {\mathbb {R}}^{X\times N}} \Phi (\varphi , \psi ) \end{aligned}$$
(5.7)

which can be solved by coordinate descent as follows. Starting from \((\psi ^{0}, \varphi ^0)\), we recursively compute \((\psi ^{k+1}, \varphi ^{k+1})\) by:

$$\begin{aligned}\psi ^{k+1}={{\,\textrm{argmin}\,}}_{\psi \in {\mathbb {R}}^{X\times N}} \Phi (\varphi ^k, \psi )\end{aligned}$$

which, using (5.3), gives totally explicit Sinkhorn-like updates:

$$\begin{aligned} \psi _i^{k+1} (x)=\varepsilon \log (\alpha _i(x))-\varepsilon \log \Big ( \sum _ {y\in X} e^{\frac{\varphi _i^k(y)-c_i(x,y)}{\varepsilon }} \Big ) \end{aligned}$$
(5.8)

for \(i=1, \ldots , N\) and every \(x\in X\). Then, we update the prices by:

$$\begin{aligned}\varphi _1^{k+1}={{\,\textrm{argmin}\,}}_{\varphi _1 \in {\mathbb {R}}^N} \Phi (\varphi _1, \varphi _2^k, \ldots \varphi _N^k, \psi ^{k+1})\end{aligned}$$

which is the same as solving a one-dimensional strictly convex minimization problem for each \(y\in X\):

$$\begin{aligned} \varphi _1^{k+1}(y)={{\,\textrm{argmin}\,}}_{t} V(y, t, \varphi _2^k(y), \ldots , \varphi _N^k(y))+ \varepsilon e^{\frac{t}{\varepsilon }} \sum _{x\in X} e^{\frac{\psi _1^{k+1}(x)-c_1(x,y)}{\varepsilon }}. \end{aligned}$$
(5.9)

This boils down to a strictly monotone equation (if V is differentiable, see the Cobb–Douglas case below) or inclusion (if V is nonsmooth) in t which can be solved efficiently by dichotomy for instance. The next price updates are performed similarly by iteratively solving for, \(i=2, \ldots , N-1\) and every \(y\in X\):

$$\begin{aligned} \begin{aligned} \varphi _i^{k+1}(y)&={{\,\textrm{argmin}\,}}_{t} \Big \{ V(y, \varphi _1^{k+1}(y), \ldots , \varphi _{i-1}^{k+1}(y), t, \varphi _{i+1}^k(y), \ldots , \varphi _N^k(y))\\&\quad + \varepsilon e^{\frac{t}{\varepsilon }} \sum _{x\in X} e^{\frac{\psi _i^{k+1}(x)-c_i(x,y)}{\varepsilon }} \Big \} \end{aligned} \end{aligned}$$
(5.10)

and for \(i=N\)

$$\begin{aligned} \begin{aligned} \varphi _N^{k+1}(y)&={{\,\textrm{argmin}\,}}_{t} \Big \{ V(y, \varphi _1^{k+1}(y), \ldots , \varphi _{N-1}^{k+1}(y), t)\\&\quad + \varepsilon e^{\frac{t}{\varepsilon }} \sum _{x\in X} e^{\frac{\psi _N^{k+1}(x)-c_N(x,y)}{\varepsilon }} \Big \}. \end{aligned} \end{aligned}$$
(5.11)

The Cobb–Douglas case If we further specify the utility to be of Cobb–Douglas formFootnote 3 :

$$\begin{aligned}U(\beta )=\prod _{i=1}^N \beta _i^{a_i}, \; \forall \beta \in {\mathbb {R}}_+^N, \; a_i>0, a:=\sum _{i=1}^N a_i <1,\end{aligned}$$

then a direct computation gives

$$\begin{aligned} V(\varphi )= (1-a) \frac{ \prod _{i=1}^N a_i^{\frac{a_i}{1-a}} }{\prod _{i=1}^N \varphi _i^{\frac{a_i}{1-a} }}. \end{aligned}$$
(5.12)

Hence, each minimization step as (5.10) amounts, for some parameter \(A>0\) and exponent b, to finding the root t of the strictly monotone equation

$$\begin{aligned} e^t t^{b} =A\end{aligned}$$

which can be solved efficiently with a few steps of a dichotomy method. We can also notice that since V is smooth and locally strongly convex on its domain, the linear convergence of the coordinate scheme described above is guaranteed by the results of Beck and Tetruashvili [15].

5.2 Numerical Results

We now present some simulations using the entropic approximation algorithm described in the previous paragraph.

5.2.1 One-Dimensional Experiments

We first consider the one dimensional case \(X=[0,1]\) (suitably discretized), assume there are twoo goods and that the utility is a Cobb-Douglas function with parameters \(a_1\) and \(a_2\). As for the transport cost, we take a power of the distance:

$$\begin{aligned} C_{\lambda , p}(x,y) = \lambda \frac{|x-y|^p}{p}, \forall (x,y) \in [0,1]^2.\end{aligned}$$

When \(\lambda \) is large, transport is costly and \(\beta _1\), \(\beta _2\) tend to be close to the initial distributions \(\alpha _1\) and \(\alpha _2\). On the contrary, when \(\lambda \) is small, the utility term dominates and \(\beta _1\) and \(\beta _2\) tend to be constant.

In our first three simulations, \(\alpha _1\) and \(\alpha _2\) are gaussians of expected values respectively \(\mu _1 = 0.25\) and \(\mu _2 = 0.75\) and standard deviations \(\sigma _1 = \sigma _2 = 0.2\), \((a_1,a_2) = (0.49,0.49)\) and \(\varepsilon = 0.01\). The three simulations correspond to different exponents p for the cost: a concave cost with \(p=0.3\) (Fig. 1), the linear cost \(p=1\) (Fig. 2) and the quadratic cost \(p=2\) (Fig. 3). In the first row, \(\alpha \) appears in light grey and \(\beta \) in dark grey, the prices \((\varphi _1,\varphi _2)\) are plotted in the second row. The third row represents the convergence of the algorithm: we plot the decay of the functional \(\Phi \) and the evolution of the error \(||\partial V(\varphi (\cdot )) + \beta (\cdot )||_2\) for the first order necessary and sufficient condition (5.4).

The most striking phenomenon shown by these simulations is the emergence of extra modes in the final distributions whereas the initial distributions are single peaked. This phenomenon seems to be related to the concavity of the transport cost. In the concave case of Fig. 1, \(\beta \) exhibits three local maxima, the linear case of Fig. 2 exhibits two modes and in the quadratic case of case of Fig. 3, initial and final distributions roughly have the same unimodal shape.

In order to further explore numerically the impact of the various parameters on the structure of the Pareto optimum, we present in Figs. 4 and  5, the case of mixture of three gaussians for \(\alpha _1\) and of two gaussians for \(\alpha _2\) for different values of \(a_1\) and \(a_2\) (in Fig. 4, \(a_1+a_2\) is close to 1 and in Fig. 5, \(a_1+a_2=0.5\)).

Fig. 1
figure 1

Gaussian distributions with concave cost

Fig. 2
figure 2

Gaussian distributions with homogeneous cost

Fig. 3
figure 3

Gaussian distributions with convex cost

Fig. 4
figure 4

Another example with \(a_1 + a_2\) close to 1

Fig. 5
figure 5

Another example with \(a_1 + a_2\) far from 1

Fig. 6
figure 6

French network with \(\varepsilon = 0.1\)

Fig. 7
figure 7

French network with \(\varepsilon = 0.1\)

Fig. 8
figure 8

French network with \(\varepsilon = 0.01\)

Fig. 9
figure 9

French network with \(\varepsilon = 0.01\)

5.2.2 A Network Model

Our next application concerns a connected network of French cities. The nodes of this network (see Fig. 6) are the nine French cities: Brest, Lille, Lyon, Marseille, Nantes, Paris, Rennes, St-Malo and Strasbourg. The cost between two neighbouring cities is given by the distance as the crow fliesFootnote 4 and more generally the cost between two cities is given by the shortest path distance between them. To take into account population differences between these cities, we have weighted the Cobb–Douglas utility by a weight w(x) for city x:

City

Brest

Lyon

Marseille

Paris

Strasbourg

Rennes

Lille

Saint-Malo

Nantes

w (thousands)

139

518

868

2175

285

217

233

46

314

This leads to the weighted utility

$$\begin{aligned} U(x,\beta _1,\beta _2,\beta _3) = w(x) \beta _1^{a_1} (x) \beta _2^{a_2} (x) \beta _3^{a_3} (x), \end{aligned}$$

and the corresponding conjugate term (see (3.12)):

$$\begin{aligned}V(x,\varphi ) = w(x)^{\frac{1}{1-a}} \frac{ \prod _{i=1}^3 a_i^{\frac{a_i}{1-a}} }{\prod _{i=1}^3 \varphi _i^{\frac{a_i}{1-a} }} \end{aligned}$$

with \(a=a_1+a_2+a_3<1\). In our examples, we have taken \( (a_1,a_2,a_3)=(0.4,0.1,0.2)\). In Figs. 6, 7, 8 and 9, we have represented the initial and final distributions as well as the optimal transport plans between the cities. Note that the larger \(\varepsilon \), the stronger the diffusion effect in the transport plan, which explains why we see more connections in Figs. 6 and 7 (large diffusion with \(\varepsilon =0.1\)) than in Figs. 8 and 9 (\(\varepsilon =0.01\)). We also give the prices of the goods in the different cities and analyze the convergence of the algorithm both in terms of the decay of the objective function \(\Phi \) and in the residuals for the optimality condition.