1 Introduction

Many practical problems are of discrete nature. For example, in inventory management retailers need to place orders in discrete quantity or even large batches. In scheduling, transportation planning, and production planning one needs to use discrete assignment variables x ij = 1 or 0 to model whether a job or a truck should be assigned to a machine or a route. In combinatorial optimization theory, there are many brilliant problem-based algorithms developed. However, it remains an important question that whether there exists a framework for a general class of problems, like convex optimization theory in continuous optimization. For this purpose, one needs to extend the Separation Lemma, which implies strong duality and global optimality. Luckily, Separation Lemma holds for submodular set functions, and the so-called L functions [23].

In economic theory and operations management applications, an important question arises from practice: whether two decisions have conflict against each other. This question belongs to the area of comparative statics, and is often related to the sign of second order partial derivatives with respect to different dimensions of decision variable. Also, in revenue and inventory management, we are often interested in whether the decisions of two different products would influence each other. For example, different brands of smartphones are called “substitutable goods,” since one can replace the function of the other. On the other hand, extra consumption of smartphones would boost sales number of the accessories, which we often define as “complementary goods.” These properties can often be characterized by submodularity and supermodularity of customer utility function.

The objective of this chapter is to introduce some basic concepts, algorithms, and applications of discrete convex optimization. Discrete convex analysis is a deep research direction, and we aim at providing a quick survey of the classical results related to optimization problems applicable in operations management. Moreover, we emphasize on the motivations and intuitions behind the concepts and proofs, and we omit certain details of proofs due to page limit. Readers may refer to Topkis’s book [29] for more detailed examples, discussions, and classical applications in supply chain management, as well as game theory related topics. Mutora [23] and his long list of research works provide a thorough survey of the theoretical foundation of discrete convex analysis, including the duality theory in discrete domain. And Vondrak’s Ph.D. thesis [30] provides a survey of many crucial ideas in designing combinatorial algorithms by utilizing submodularity.

Section 2 introduces the basic concepts, e.g., lattices, submodular function, and comparative statics. Fundamental properties of submodular functions and lattice sets are introduced. Examples arisen from applications are given to illustrate how to model problems with submodularity and other discrete convex properties.

Section 3 focuses on classical results of submodular set function optimization. Separation Lemma and convex extensions are introduced, and the minimization algorithm over submodular functions is established based on convex extensions. Moreover, greedy approaches and multi-linear extension based smooth-greedy algorithms are introduced for the maximization problems.

Section 4 discusses online and dynamic algorithms utilizing submodularity. L -convexity plays a key role for dynamic inventory control problems, while the diminishing return property guarantees \(1-\frac {1}{e}\) approximation ratio of greedy algorithm in online bipartite matching.

2 Basic Definitions and Properties

This section introduces the basics of submodularity and lattice structure. Section 2.1 illustrates the intuition of developing such concepts. Section 2.2 defines the basic concepts and establishes the basic properties. Section 2.3 discusses a special application where only submodularity only holds locally near the optimum solution path.

2.1 Motivation

To begin with, we start with the following observations:

  1. 1.

    When a competitor lowers the price of its product, one often needs to also lower his/her own price.

  2. 2.

    When the inventory level of a product is low, retailers often raise the price.

  3. 3.

    In public spaces, one would naturally lower his/her voice, if the others are doing so.

To explain these observations and to further study the related problems, one needs to provide reasonable mathematical models:

Example 1

Suppose the sales quantity Q i of retailer i is a function Q i(p i, p j) of the price p i of retailer i, and its major competitor’s price p j, and the corresponding profit is R i(p i, p j) = (p i − c i)Q i(p i, p j).

The simplest assumption is Q i(p i, p j) = (a i + b ii p i + b ij p j)+ with b ii < 0, b ij > 0. The optimum price \(p_i^*=\frac {a_i+b_{ij}p_j-b_{ii}c_i}{-2b_{ii}}\) for \(\max \{R_i(p_i,p_j)\mid p_i\ge 0\}\) is indeed increasing with respect to p j. Note that \(b_{ij}=\frac {\partial ^2}{\partial p_i\partial p_j}R_i(p_i,p_j)>0\) is the crucial assumption, and can be generalized for other types of demand functions.

Naturally, one would like to extend the question to the following general comparative statistics question:

Problem 1

Given function \(f(x,y): \Re ^2\rightarrow \Re \), where x is the decision variable, and y is the input parameter (maybe the decision of another player). We consider the minimization problem \(\min \{f(x,y): (x,y)\in S\}\) within domain S. When would the optimum decision x(y) be monotonically increasing with respect to input parameter y?

We analyze quadratic functions first:

Theorem 1

If

$$\displaystyle \begin{aligned}f(x,y)=\frac{1}{2}\left(\begin{array}{cc}x& y \\\end{array}\right)A\left(\begin{array}{c}x \\ y\\\end{array}\right)+b^T\left(\begin{array}{c}x \\ y \\\end{array}\right)+c\end{aligned}$$

is a strongly convex function(A ≻ 0). The optimum solution of \(\min \{f(x,y)\mid x\in \Re \}\) is defined as x (y). Then x (y) is monotonically increasing with respect to y when A 12 < 0.

Proof

Due to strong convexity, A 11 > 0. By first order condition A 11 x (y) + A 12 y + b 1 = 0, the optimum solution is

$$\displaystyle \begin{aligned}x^*(y)=-\frac{A_{12}}{A_{11}}y-\frac{b_1}{A_{11}}.\end{aligned}$$

Therefore, x (y) is monotonically increasing with respect to y when A 12 < 0.

Next, we establish a more general result by dropping the quadratic assumption, by a proof with potential to be generalized in discrete domain:

Theorem 2

If \(f(x,y): \Re ^2\rightarrow \Re \) is a strongly convex C 2 function. The optimum solution of \(\min \{f(x,y)\mid x\in \Re \}\) is defined as x (y). Then x (y) is increasing with respect to y if \(\frac {\partial ^2}{\partial x\partial y}f(x,y)\le 0\) for all \((x,y)\in \Re ^2\).

Proof

Firstly, we note that for any x ≤ x′, y ≤ y′,

$$\displaystyle \begin{aligned}f(x,y)+f(x',y')-f(x,y')-f(x',y) =\int_{s=x}^{x'}\int_{t=y}^{y'}\frac{\partial^2}{\partial s\partial t}f(s,t)dsdt\le 0.\end{aligned}$$

This condition is illustrated as in Figure 1.

Fig. 1
figure 1

Idea of proof for general problem

Secondly, we prove the theorem by contradiction. Due to strong convexity, x (y) is uniquely defined for each \(y\in \Re \). If for y < y′ we have x (y) > x (y′), let’s denote x = x (y′) and x′ = x (y) > x. Then \(f(x,y')=\min \{f(s,y')\mid s\in \Re \}\le f(x',y')\) and \(f(x',y)=\min \{f(s,y)\mid s\in \Re \}\le f(x,y)\). Therefore,

$$\displaystyle \begin{aligned}0\ge \int_{s=x}^{x'}\int_{t=y}^{y'}\frac{\partial^2}{\partial s\partial t}f(s,t)dsdt= [f(x,y)-f(x',y)]+[f(x',y')-f(x,y')]\ge 0.\end{aligned}$$

Consequently, f(x, y′) = f(x′, y′) and f(x′, y) = f(x, y), which contradicts with the uniqueness of x (y) and x (y′).

There are two crucial conditions in the above proof:

  1. 1.

    For any (x, y′) and (x′, y) in the domain S, if x ≤ x′, y ≤ y′, then (x, y), (x′, y′) ∈ S.

  2. 2.

    For any x ≤ x′, y ≤ y′, f(x, y) + f(x′, y′) − f(x, y′) − f(x′, y) ≤ 0.

In the following subsection, we generalize the first condition to the so-called Lattice structure, and the second condition to submodular property of functions.

2.2 Definition

In high dimensional discrete domain, the first condition in the above subsection is generalized as:

Definition 1 (Lattice)

  1. 1.

    Partial Order: x ≤ y if and only if x i ≤ y i for all indices i.

  2. 2.

    Maximization (or) Operation: x ∨ y defined as \((x\vee y)_i=\max \{x_i,y_i\}\).

  3. 3.

    Minimization (and) Operation: x ∧ y defined as \((x\wedge y)_i=\min \{x_i,y_i\}\).

  4. 4.

    Lattice: \(L\subseteq \Re ^n\) is a lattice if and only if x ∨ y, x ∧ y ∈ L for any x, y ∈ L.

  5. 5.

    Sublattice: If L′ is a subset of lattice L and x ∨ y, x ∧ y ∈ L′ for any x, y ∈ L′, we call L′ a sublattice of L.

  6. 6.

    For a set of points \(\{x^j\in \Re ^n: j\in S\}\), we can define ∨jS x j as \(\left (\vee _{j\in S} x^j\right )_i=\sup \{x_i^j\mid j\in S\}\) and \(\left (\wedge _{j\in S} x^j\right )_i=\inf \{x_i^j\mid j\in S\}\). These are well defined when S is a finite set, or when {x j : j ∈ S} is within a bounded region.

Some important classes of lattices are listed as follows:

  1. 1.

    Any totally ordered set (e.g., single dimensional set) is a lattice!

  2. 2.

    Finite Cartesian product L =∏jS L j of lattices L j : j ∈ S is still a lattice when |S| is finite.

  3. 3.

    Intersection L =⋂jS L j of lattices L j : j ∈ S is still a lattice, regardless of the size of S.

  4. 4.

    Orthogonal projections and orthogonal slices of lattices are still lattices.

  5. 5.

    Linearly constrained set {(x, y) : ax − by ≥ c} with a, b ≥ 0.

Theorem 3

Suppose \(L\subseteq \Re ^N\) is a compact sublattice. Then there is a minimum element \( \underline x\) and a maximum element \(\overline x\) in L.

Proof

Because L is compact, its projection L i = {y∣∃x ∈ L, x i = y} on i-th dimension is also compact. Define \( \underline x_i=\inf \{x_i\mid x\in L\}\), which is well defined because L i is compact, and will be reached by a certain point, which we denote as y i, i.e., \(y_i^i= \underline x_i\) and y i ∈ L. Now we consider the point \(\wedge _{i=1}^N y^i\in L\). It follows from definition that \( \underline x \le \wedge _{i=1}^N y^i\). Furthermore, \(\left (\wedge _{i=1}^N y^i\right )_i\le y_i^i= \underline x_i\), therefore \( \wedge _{i=1}^N y^i \le \underline x\). Consequently \( \underline x =\wedge _{i=1}^N y^i\in L\). Similarly, we have \(\overline x\in L\).

The second condition in the above subsection is extended to the concept of submodularity:

Definition 2 (Submodular Function)

  1. 1.

    A function \(f(x): L\rightarrow \Re \) defined on lattice L is called a submodular function iff(x) + f(y) ≥ f(x ∨ y) + f(x ∧ y) for any x, y ∈ L.

  2. 2.

    Equivalent Definition (Decreasing Incremental):

    If d, u ≥ 0 and d T u = 0, then f(x + d) − f(x) ≥ f(x + u + d) − f(x + u).

  3. 3.

    Equivalent Definition (Local Condition):

    If f is defined on Z n, and f(x + 1 i) − f(x) ≥ f(x + 1 i + 1 j)) − f(x + 1 j) for all indices i ≠ j.

  4. 4.

    Supermodular Function: A function f is supermodular if and only if − f is submodular.

Example 2 (Examples of Submodular Functions)

  1. 1.

    Quadratic Functions \(\frac {1}{2}x^TAx+b^Tx+c\) with A ij ≤ 0 for all i ≠ j.

  2. 2.

    A C 2 function \(f(x):\Re ^n\rightarrow \Re \) with \(\frac {\partial ^2}{\partial x_i\partial x_j}f(x)\le 0\) for all i ≠ j.

  3. 3.

    g(x − y) with convex function \(g(z):\Re \rightarrow \Re \).

  4. 4.

    \(g(\sum _{i=1}^n x_i)\) with concave function \(g(z):\Re \rightarrow \Re \).

  5. 5.

    Cobb–Douglas function \(f(x)=\prod _i x_i^{\alpha _i}\) defined on \(\Re _+^n\), with \(\alpha \in \Re _+^n\).

  6. 6.

    \(\|x-y\|{ }_2^2=\sum _i (x_i-y_i)^2\) and ∥x − y1 =∑i|x i − y i|.

  7. 7.

    Nonnegative linear combinations, expectations, and limitations of submodular functions are still submodular.

  8. 8.

    g(f(x)) is submodular, if \(f:\Re ^n\rightarrow \Re \) is submodular, \(g:\Re \rightarrow \Re \) is concave and monotonically increasing.

A set function \(f(S): 2^N\rightarrow \Re \) is defined on the set 2N of all subsets of N.

Definition 3 (Submodular Set Function)

  1. 1.

    A set function is called submodular set function, if f(A) + f(B) ≥ f(A ∪ B) + f(A ∩ B) for any A, B ⊆ N.

  2. 2.

    Equivalent Definition (Local Condition): For any set A ⊆ N, and two elements i, j ∈ N, f(A ∪{i}) + f(A ∪{j}) ≥ f(A) + f(A ∪{i, j}).

  3. 3.

    Connection with Submodular Function: Define \(F:\{0,1\}^N\rightarrow \Re \) as F(1 S) = f(S), then f is a submodular set function if and only if F is a submodular function.

There is a special class of submodular function generalizing the concept of rank in linear algebra:

Definition 4 (Rank Function)

A set function \(F:2^N\rightarrow \Re \) which satisfies F(∅) = 0 (normalized), F(A) ≤ F(B) for all A ⊆ B (monotonicity) and submodularity, is called a rank function.

One example of rank function R(S) defined on set of vectors \(S=\{v_i\in \Re ^m: i\in K\}\) is the rank of the spanning space of S.

Now we extend the monotonicity result in Theorem 2 to discrete scenario:

Theorem 4 (Theorem 2.7.1 in Topkis [29])

If \(f(x):L\rightarrow \Re \) is a submodular function defined on lattice domain L, then the optimum solution set argminxX f(x) is a sublattice.

Proof

We prove this by definition. Suppose both u, v ∈argminxX f(x), then f(u) = f(v) =minxX f(x). Therefore f(u ∨ v) ≥minxX f(x) = f(u) and f(u ∧ v) ≥minxX f(x) = f(v). It follows that f(u ∨ v) + f(u ∧ v) ≥ f(u) + f(v). But by submodularity, f(u ∨ v) + f(u ∧ v) ≤ f(u) + f(v). Combine the two above inequalities, f(u ∨ v) = f(u ∧ v) = f(u) = f(v) =minxX f(x), and both u ∨ v, u ∧ v ∈argminxX f(x).

Next, we establish the monotonicity of the optimum decision set, with respect to input parameters. For this purpose, we need to first define the set monotonicity, which basically is the monotonicity of both the largest and smallest elements of the sets, if they do exist.

Definition 5 (Set Monotonicity)

Set S t is called monotonically increasing with respect to t, if for any t < s, x ∈ S t, and y ∈ S s, there exist a u ∈ S s and v ∈ S t such that u ≥ x and v ≤ y. This implies that the \(\overline x(t)=\vee _{x\in S_t} x\) and \( \underline x(t)=\wedge _{x\in S_t} x\) are both increasing in t.

An important fact is slices of lattice remains to be a lattice, which is illustrated in the following theorem. The proof of this theorem follows directly from the definition and is omitted here.

Theorem 5 (Monotonicity of Lattice Slices)

If S  X × T is a sublattice of X × T for lattices X and T, then S t = {x∣(x, t) ∈ S} is increasing on t, when it’s nonempty.

Theorem 6 (Topkis, Theorem 2.8.2)

Suppose \(f(x,t):S\rightarrow \Re \) is a submodular function defined on sublattice S  X × T, where both X and T are lattices. Then X (t) = argmin{f(x, t) : (x, t) ∈ S} is increasing with respect to t when it is nonempty, and the set {(u, t)∣u  X (t)} is a sublattice.

Proof

We first prove that the set L = {(u, t)∣u ∈ X (t)} is a sublattice by definition. For any (u, t), (v, s) ∈ L, without losing generality we assume t ≤ s. By definition, we have \(\min \{f(x,s): (x,s)\in S\}=f(v,s)\) and \(\min \{f(x,t): (x,t)\in S\}=f(u,t)\). And it follows from lattice structure that both (u ∨ v, s) = (u, t) ∨ (v, s) and (u ∧ v, t) = (u, t) ∧ (v, s) are in set S. Therefore, \(f(u\vee v,s)\ge \min \{f(x,s): (x,s)\in S\}=f(v,s)\) and \(f(u\wedge v,t)\ge \min \{f(x,t): (x,t)\in S\}=f(u,t)\). However, by submodularity of f we have

$$\displaystyle \begin{aligned}f(u\vee v,s)+f(u\wedge v,t)=f((u,t)\vee (v,s))+f((u,t)\wedge (v,s))\le f(u,t)+f(v,s).\end{aligned}$$

It could only hold when f(u ∨ v, s) = f(v, s) and f(u ∧ v, t) = f(u, t). Therefore, u ∨ v ∈ X (s) and u ∧ v ∈ X (t), and by definition we have (u, t) ∨ (v, s) = (u ∨ v, s) ∈ L and (u, t) ∧ (v, s) = (u ∧ v, t) ∈ L.

By Theorem 5, set X (t) = {x∣(x, t) ∈ L} increases with respect to t.

Corollary 1 (Topkis, Corollary 2.8.1)

If f(x) is a submodular function defined on lattice domain \(X\subseteq \Re ^n\) , thenf(x) − y T x is submodular on domain \(X\times \Re ^n\) , and argminxX f(x) − y T x increases with respect to y.

Proof

Function − y T x is submodular, so is f(x) − y T x on domain \(X\times \Re ^n\), applying Theorem 6 we obtain the monotonicity.

For submodular functions, another important characteristic which mimics the convexity in continuous domain is the classical preservation under minimization property:

Theorem 7 (Preservation of Submodularity)

Suppose both S and T are lattices and X  S × T is a sublattice. Function \(f:X\rightarrow \Re \) is a submodular function. Then the function \(g(y)=\min \{f(x,y)\mid (x,y)\in X\}\) is a submodular function defined on sublattice domain Y = {y∣∃(x, y) ∈ X}.

Proof

We first prove the lattice structure of Y  by definition. For any y, y′∈ Y , there exists x, x′∈ S such that (x, y) ∈ X and (x′, y′) ∈ X. Since X is a lattice, (x ∨ x′, y ∨ y′) = (x, y) ∨ (x′, y′) ∈ X and (x ∧ x′, y ∧ y′) = (x, y) ∧ (x′, y′) ∈ X. Therefore, y ∨ y′ and y ∧ y′ are both in Y .

Secondly, we establish the submodularity of g by constructive proof, which is very useful in establishing properties of discrete convexity. For y, y′∈ Y , there exists z, z′∈ S such that both (z, y), (z′, y′) ∈ X, f(z, y) = g(y), and f(z′, y′) = g(y′). Therefore,

$$\displaystyle \begin{aligned}\begin{array}{l} g(y\vee y')+g(y\wedge y')\le f(z\vee z', y\vee y')+f(z\wedge z', y\wedge y')\\ \quad =f[(z,y)\vee (z',y')]+f[(z,y)\wedge (z',y')]\\ f[(z,y)\vee (z',y')]+f[(z,y)\wedge (z',y')]\le f(z,y)+f(z',y')=g(y)+g(y'), \end{array} \end{aligned}$$

where the first inequality is due to definition of g, the second inequality is due to submodularity of f, and the last equality is due to definition of z and z′.

2.3 Local Submodularity

In practice, it is often difficult to guarantee the submodularity of a function over the whole domain. However, for the monotonicity of optimum solution we only need the submodularity in a small region, i.e., a neighborhood of the optimum solution set path. In the following example, we use local supermodularity to explain why one retailer’s price should decrease, if its competitors’ prices are dropping.

Example 3 (Discrete Choice Model)

A popular model that captures customer choice between substitutable goods is the so-called random utility (discrete choice) model. In this model, customers have random utility ξ i(p i) for goods i with price p i, where u i(p i) =  i(p i) is the expected utility. A random customer would choose the goods which give him/her the best (realized) utility. When the random noises ξ i(p i) − u i(p i) follow independent Gumbel distributions, the probability that a customer would choose goods i from a set S of goods is \(P_i=\frac {u_i(p_i)}{1+\sum _{j\in S} u_j(p_j)}\), while the probability of not choosing anything is \(P_0=\frac {1}{1+\sum _{j\in S} u_j(p_j)}\). One thing to note that is, a popular choice in practice is to use the logistic model: \(u_i(p_i)=e^{\alpha _i p_i+\beta _i}\). Retailer i’s expected profit from a random customer is therefore, R i = (p i − c i)P i if the cost per unit is c i. We adopt the classical notation that all prices other than p i are denoted as p i, and optimum solution is \(p_i^*(p_{-i})=\mathrm {argmax} \{R_i(p_i,p_{-i})\mid p_i\in \Re _+)\). We assume \(u^{\prime }_j(p_j)<0\) for each retailer j, which is intuitive as customer’s utility would decrease with respect to the price of goods.

Lemma 1

If each u i is a C 2 function, then in an open neighborhood of optimum solution path \(\{(p_i^*(p_{-i}), p_{-i})\mid p_{-i})\in \Re _+^{n-1}\}\) , we have \(\frac {\partial ^2}{\partial x_i\partial x_j}R_i(p)>0\).

Proof

The profit is negative when p i = 0, and tends to 0 when p i →, by continuity of R i the optimum solution exists. Since the function R i is also a C 2 function, we only need to verify the condition \(\frac {\partial ^2}{\partial x_i\partial x_j}R_i(p)>0\) for \(p_i=p_i^*(p_{-i})\). By optimality condition at \(p_i=p_i^*(p_{-i})\),

$$\displaystyle \begin{aligned}0=\frac{\partial}{\partial x_i}R_i(p)=\frac{1}{(1+\sum_{j\in S} u_j(p_j))^2}\left[u_i(p_i)(1+\sum_{j\in S} u_j(p_j))+(p_i-c_i)u^{\prime}_i(p_i)(1+\sum_{j\in S} u_j(p_j)-u_i(p_i))\right],\end{aligned}$$

and \(u_i(p_i)(1+\sum _{j\in S} u_j(p_j))+(p_i-c_i)u^{\prime }_i(p_i)(1+\sum _{j\in S} u_j(p_j)-u_i(p_i))=0\).

$$\displaystyle \begin{aligned}\begin{array}{rl}&\frac{\partial^2}{\partial x_i\partial x_j}R_i(p)\\ =&\frac{-u^{\prime}_j(p_j)}{(1+\sum_{j\in S} u_j(p_j))^3}\left[u_i(p_i)(1+\sum_{j\in S} u_j(p_j))+(p_i-c_i)u^{\prime}_i(p_i)(1+\sum_{j\in S} u_j(p_j)-2u_i(p_i))\right]\\ =&\frac{-u^{\prime}_j(p_j)}{(1+\sum_{j\in S} u_j(p_j))^3}(p_i-c_i)u^{\prime}_i(p_i)(-2u_i(p_i))>0. \end{array}\end{aligned}$$

Theorem 8

When \(u_i(p_i)=e^{\alpha _i p_i+\beta _i}\) , \(p_i^*(p_{-i})\) is continuous, and it is monotonically increasing with respect to p i.

Proof

We first prove the strongly concavity of \(\ln R_i(p_i, p_{-i})\) in p i. Notice that

$$\displaystyle \begin{aligned}\frac{\partial}{\partial p_i}\ln R_i=\frac{1}{p_i-c_i}+\frac{u^{\prime}_i(p_i)(1+\sum_{j\in S\backslash\{i\}}u_j)}{u_i(p_i)(1+\sum_{j\in S}u_j)}=\frac{1}{p_i-c_i}+\alpha_i\frac{1+\sum_{j\in S\backslash\{i\}}u_j}{1+\sum_{j\in S}u_j}.\end{aligned}$$

Notice that α i < 0 and u i(p i) is decreasing with respect to p i, \(\frac {\partial }{\partial p_i}\ln R_i\) is a decreasing function with respect to p i, and \(\ln R_i\) is a strongly concave function with respect to p i. Since R i is C 2, and strongly concave in p i, \(p_i^*(p_{-i})\) is continuous.

By the local supermodularity, there exists a small neighborhood \(N_{\epsilon }=\{p\in \Re _+^n\mid \|p-(p_i^*(p_{-i}), p_{-i})\|{ }_{\infty }\le \epsilon \}\) of any point \((p_i^*(p_{-i}), p_{-i})\) on optimum solution path, inside which \(\frac {\partial ^2}{\partial x_i\partial x_j}R_i(p)>0\). Therefore, by applying Theorem 6 in the box, for any q i ∈ [p i, p i + 𝜖e] we have

$$\displaystyle \begin{aligned}x=\mathrm{argmax}\{R_i(p_i,q_{-i})\mid p_i\in [p_i^*(p_{-i})-\epsilon, p_i^*(p_{-i})+\epsilon]\}\ge p_i^*(p_{-i}).\end{aligned}$$

By log-concavity of function R i, local optimum x within region \([p_i^*(p_{-i})-\epsilon , p_i^*(p_{-i})+\epsilon ]\) is on the same side of the point \(p_i^*(p_{-i})\in [p_i^*(p_{-i})-\epsilon , p_i^*(p_{-i})+\epsilon ]\) with the global optimum point \( p_i^*(q_{-i})\), therefore

$$\displaystyle \begin{aligned}p_i^*(q_{-i})=\mathrm{argmax}\{R_i(p_i,q_{-i})\mid p_i\in \Re\}\ge p_i^*(p_{-i}).\end{aligned}$$

3 Optimization with Submodular Set Functions

In this section, we introduce the classical results for optimization over submodular set functions. Section 3.1 introduces the Lovasz extension for submodular set function. Section 3.2 discusses the polymatroid optimization. In Section 3.3, Lovasz extension is utilized for minimization of submodular set functions, with a fast gradient projection based algorithm. In Section 3.4, we analyze greedy and double greedy approaches for monotone and nonmonotone submodular set functions maximization problem. In Section 3.5, the smooth-greedy approaches based on multi-linear extension are analyzed for submodular set functions maximization problem with matroid constraint.

3.1 Extensions of Submodular Set Function

We first recall two important definitions in convex optimization theory. The convex hull of a set X is defined as Conv(X) = Cl{∑i s i x i :∑i s i = 1, s i ≥ 0, x i ∈ X}, where Cl defines the closure of a set. Epigraph of a function f is defined as epigraph(f) = {(x, t) : f(x) ≤ t}. A classical fact in convex optimization theory is that: A function is convex if and only if its epigraph is a convex set.

For each given set function \(f:2^N\rightarrow \Re \), we can define \(\widetilde {f}: \{0,1\}^N\rightarrow \Re \) as \(\widetilde {f}({\mathbf {1}}_S)=f(s)\), where 1 S is the characteristic vector defined as \(x\in \Re ^N\) with x = 1 if i ∈ S and x i = 0 if iS. We treat extreme points {0, 1}N of a box as the set of subsets 2N, where 1 S is equivalent to set S.

Definition 6 (Convex Extension and Lovasz Extension)

  1. 1.

    Given function \(f:X\rightarrow \Re \), we define its convex hull \(f^-:\mathbf {Conv}(X)\rightarrow \Re \) as

    $$\displaystyle \begin{aligned}f^-(x)=\inf\{\sum_i s_i^jf(x_i^j): \lim_{j\rightarrow \infty}x^j=x, \sum_i s_i^jx_i^j=x^j, \sum_i s_i^j=1, s_i^j\ge 0, x_i^j\in X\},\end{aligned}$$

    which is the largest convex function below f.

  2. 2.

    Given a set function \(f:2^N\rightarrow \Re \), the Lovasz extension \(f^L(x):[0,1]^N\rightarrow \Re \) is defined as \(f^L(x)=\sum _{j=1}^m s_jf(S_j)\), where {S j} is the unique decreasing series of sets N = S 1 ⊃ S 2 ⊃ S 3 ⊃⋯ ⊃ S m = ∅ such that \(x=\sum _j s_j{\mathbf {1}}_{S_j}\) for ∑j s j = 1, s j ≥ 0.

  3. 3.

    Equivalent Definition of Lovasz Extension: Take uniform distribution ξ ∈ [0, 1], then f L(x) = E ξ f({i : x i ≥ ξ}).

  4. 4.

    For any S ⊆ N, f L(1 S) = f (1 S) = f(S). So both are extensions for set functions.

Theorem 9

Convex hull f of any function f is convex, and it is the largest convex function below f.

Proof

We analyze the epigraph of f :

$$\displaystyle \begin{aligned}\begin{array}{rl}\mathbf{epigraph}(f^-)&={\mathbf Cl}\{(x,t): \exists \sum_i s_i=1, s_i\ge 0, x_i\in X, \sum_i s_ix_i=x, \sum_i s_if(x_i)\le t\}\\ &={\mathbf Cl}\{(x,t): \exists \sum_i s_i=1, s_i\ge 0, x_i\in X, \sum_i s_ix_i=x, \sum_i s_it_i= t, f(x_i)\le t_i\}\\ &=\mathbf{Conv}(\bigcup_i\{(x_i,t_i): f(x_i)\le t_i\}), \end{array}\end{aligned}$$

which is a convex set. Therefore, by convex optimization theory f (x) is convex.

Because f is an extension of f, it is below f. Next we prove that any convex function g below f is also below f . For any \(s\in \Re _+^n\) and x i ∈ X, i = 1, 2, ⋯ , N with ∑i s i = 1, s i ≥ 0, ∑i s i x i = x, by convexity we have

$$\displaystyle \begin{aligned}g(x)\le \sum_i s_ig(x_i)\le \sum_i s_if(x_i).\end{aligned}$$

Therefore, it follows from definition that

$$\displaystyle \begin{aligned}f^-(x)=\inf\{\sum_i s_i^jf(x_i^j): \lim_{j\rightarrow \infty}x^j=x, \sum_i s_i^jx_i^j=x^j, \sum_i s_i^j=1, s_i^j\ge 0, x_i^j\in X\}\ge \inf\{g(x^j)\}\ge g(x).\end{aligned}$$

Theorem 10

If f is a submodular set function, then f (x) = f L(x) and f L is convex. Reversely, if the Lovasz extension f L of a set function f is convex, f has to be submodular.

Proof

We can formulate the convex extension as a linear programming problem:

$$\displaystyle \begin{aligned}\begin{array}{rll} f^-(x)&=\min &\sum_{S\subseteq N} \lambda_Sf(S)\\ &{\mathbf s.t.} &\sum_{S:i\in S} \lambda_S=x_i \forall i\in N\\ && \sum_S \lambda_S=1\\ &&\lambda\ge 0, \end{array}\end{aligned}$$

whose dual is

$$\displaystyle \begin{aligned}\begin{array}{rll} f^-(x)&=\max& t+\sum_{i\in N} y_ix_i\\ &{\mathbf s.t.} & \sum_{i\in S} y_i\le f(S)-t \forall S\subseteq N.\\ \end{array}\end{aligned}$$

For any given x ∈ [0, 1]N, there exists order π of indices such that \(x_{\pi _1}\le x_{\pi _2}\le \cdots \le x_{\pi _N}\). Define S j = {π j, ⋯ , π N} for j = 1, 2, ⋯ , N and S N+1 = ∅. Define \(\lambda _{S_j}=x_{\pi _j}-x_{\pi _{j-1}}\) with \(x_{\pi _0}=0\) and \(x_{\pi _{N+1}}=1\), and λ S = 0 if else. Then λ ≥ 0 and \(\sum _{j=1}^{N+1}\lambda _j=1\). Furthermore,

$$\displaystyle \begin{aligned}\sum_{S:i\in S} \lambda_S=\sum_{j:j\le \pi^{-1}(i)}\lambda_{S_j}=\sum_{j:j\le \pi^{-1}(i)}x_{\pi_j}-x_{\pi_{j-1}}=x_i.\end{aligned}$$

Therefore, λ is a primal feasible solution with the given x.

For the dual problem, define t = f(∅) and \(y_i=f(S_{\pi ^{-1}(i)})-f(S_{1+\pi ^{-1}(i)})=f(S_{\pi ^{-1}(i)})-f(S_{\pi ^{-1}(i)}\backslash \{i\})\). For any set \(S=\{\pi _{j_1}, \pi _{j_2}, \cdots , \pi _{j_m}\}\) with j 1 < j 2 < ⋯ < j m, denote \(S^k=\{\pi _{j_1}, \pi _{j_2}, \cdots , \pi _{j_k}\}\). Then

$$\displaystyle \begin{aligned}\sum_{i\in S} y_i=\sum_{i\in S}f(S_{\pi^{-1}(i)})-f(S_{\pi^{-1}(i)}\backslash \{i\})\le \sum_{k=1}^m f(S^k)-f(S^{k+1})=f(S)-f(\emptyset)=f(S)-t.\end{aligned}$$

Therefore, (t, y) is a dual feasible solution.

Next we establish the strong duality, that is,

$$\displaystyle \begin{aligned} \sum_{S\subseteq N} \lambda_Sf(S) =\sum_{j=1}^{N+1}(x_{\pi_j}-x_{\pi_{j-1}})f(S_j) =\sum_{j=1}^{N}x_{\pi_j}[f(S_j)-f(S_{j+1})]+x_{\pi_{N+1}}f(S_{N+1})-x_{\pi_0}f(S_1) =\sum_{i\in N}x_iy_i+t. \end{aligned}$$

Take all j such that \(\lambda _{S_j}>0\), these \((\lambda _{S_j}, S_j)\) define the Lovasz extension f L(x). Therefore \(f^-(x)=\sum _{j:\lambda _{S_j}>0} \lambda _{S_j}f(S_j)=f^L(x)\). Because f is always convex, so is f L.

If f L is convex, then for any S, T ⊆ N, consider point \(x=\frac {{\mathbf {1}}_S+{\mathbf {1}}_T}{2}=\frac {{\mathbf {1}}_{S\cap T}+{\mathbf {1}}_{S\cup T}}{2}\). By definition, \(f^L(x)=\frac {f(S\cap T)+f(S\cup T)}{2}\). By convexity, \(f^-(x)\le \frac {f(S)+f(T)}{2}\). Therefore

$$\displaystyle \begin{aligned}f(S)+f(T)\ge 2f^-(x)=2f^L(x)=f(S\cap T)+f(S\cup T).\end{aligned}$$

In convex optimization theory, the Separation Lemma guarantees the existence of “dual certificate” of an optimum primal solution for a convex optimization problem, which is a big step towards strong duality. For submodular set functions, we have the following:

Theorem 11 (Frank’s Discrete Separation Theorem)

If f(S), g(S) are submodular and supermodular set functions defined on sublattice domain D ⊆ 2N , respectively, and f(S) ≥ g(S) for all S  N, then there exists a modular (linear) function L(S) = c +∑iS l i such that f(S) ≥ L(S) ≥ g(S) for all S  N.

Proof

We prove for D = 2N first. Since both f and − g are submodular set functions, their Lovasz extensions f L and (−g)L are convex. Note that f(S) + (−g)(S) ≥ 0 for all S ⊆ N, it follows from definition that f L(x) + (−g)L(x) = (f + (−g))L(x) ≥ 0 for all x ∈ [0, 1]N. Due to Separation Lemma in convex optimization theory, there exists a linear function \(L^-(x):[0,1]^N\rightarrow \Re \) such that f L(x) ≥ L (x) ≥−(−g)L(x) for all x ∈ [0, 1]N. Constraint this L function in {0, 1}N we obtain the modular function

$$\displaystyle \begin{aligned}L(S)=L^-({\mathbf 1_S})=L^-(0)+\sum_{i\in S}[L^-(e_i)-L^-(0],\end{aligned}$$

which satisfies f(x) ≥ L(x) ≥ g(x).

For D ≠ 2N, we can extend the function f to domain 2N by defining f(S) = + for all SD. Similarly, we extend g by defining g(S) = − for all SD. The extended functions are still submodular and supermodular, and we can apply the proof for the full domain 2N directly.

Optimum solution of a convex function can be verified by a tangent hyperplane which touches the epigraph of the convex function. Similarly, we have the following existence result for the certificate of optimum solution of submodular set function minimization problem:

Corollary 2

If f(S) is a submodular set functions defined on domain 2N , and L ⊆ 2N is a sublattice. Then S is the optimum solution for \(\min \{f(S)\mid S\in L\}\) if and only if there exists a modular set function \(L:2^N\rightarrow \Re \) such that f(S ) = l(S ), f(S) ≥ l(S) for all S  N and L ⊆{S : l(S) ≥ l(S )}.

This is a direct application of Theorem 11, and the fact that f(S) ≥ f(S ) ≥ 2f(S ) − f(S) for all S ∈ L.

3.2 Polymatroid Optimization

In the proof of Theorem 10, the dual formulation of f has been discussed:

$$\displaystyle \begin{aligned}\begin{array}{rll} f^-(x)&=\max& t+\sum_{i\in N} y_ix_i\\ &{\mathbf s.t.} & \sum_{i\in S} y_i\le f(S)-t \forall S\subseteq N. \end{array}\end{aligned}$$

The optimum solution for the dual problem is \(y_i=f(S_{\pi ^{-1}(i)})-f(S_{1+\pi ^{-1}(i)})=f(S_{\pi ^{-1}(i)})-f(S_{\pi ^{-1}(i)}\backslash \{i\})\), where the order π corresponds to the increasing order of x i: \(x_{\pi _1}\le x_{\pi _2}\le \cdots \le x_{\pi _N}\). For sets \(S^k=\{\pi _{j_1}, \pi _{j_2}, \cdots , \pi _{j_k}\}\),

$$\displaystyle \begin{aligned}\sum_{i\in S^k} y_i=\sum_{i\in S^k}f(S_{\pi^{-1}(i)})-f(S_{\pi^{-1}(i)}\backslash \{i\})= \sum_{k=1}^m f(S^k)-f(S^{k+1})=f(S^k)-t.\end{aligned}$$

Therefore, S k corresponds to the tight dual constraints, and the optimum solution can be obtained by the greedy process: rank the coefficients in the objective from highest (π N) to lowest (π 1), find the maximum possible value y j one by one.

We conclude this observation into the so-called polymatroid optimization framework:

Definition 7 (Polymatroid Optimization)

Given a nonnegative set function r : 2N →  +, it induces a polytope (with exponentially many linear constraints)

$$\displaystyle \begin{aligned}\mathbf{P}(r,N)=\{x\in \Re_+^N\mid \sum_{j\in S}x_j\le r(S)\ \forall S\subseteq N\}.\end{aligned}$$

This polytope is called a polymatroid if r is a rank function.

Problem 2

How to maximize a linear objective function with a polymatroid constraint

$$\displaystyle \begin{aligned}max\left\{\sum_{j\in N}c_jx_i\mid x\in \mathbf{P}(r,N)\right\}.\end{aligned}$$

Algorithm 1: Greedy optimum

Theorem 12

The greedy Algorithm 1 is optimum for Problem 2.

This theorem has been established in [8]. We can prove the theorem by constructing primal–dual solution with no duality gap, where the primal solution x is already constructed by the greedy algorithm, and the dual is exactly the same as the primal solution in Theorem 10.

Furthermore, in [15], He et al. established the following structural result of polymatroid optimization:

Theorem 13 (Preservation of Submodularity)

If \(r: 2^N\rightarrow \Re \) is a rank function, the function

$$\displaystyle \begin{aligned}F(c)=\max\left\{\sum_{j\in N}c_jx_i\mid x\in \mathbf{P}(r,N)\right\}\end{aligned}$$

is a submodular function, and the function

$$\displaystyle \begin{aligned}\hat F(S)=\max\left\{\sum_{j\in S}c_jx_i\mid x\in \mathbf{P}(r,S)\right\}\end{aligned}$$

is a rank function for given \(c\in \Re _+^N\) . Furthermore,

$$\displaystyle \begin{aligned}\hat F(S)=\max\left\{\sum_{j\in S}f_j(x_j)\mid x\in \mathbf{P}(r,S)\right\}\end{aligned}$$

is a rank function if the objective function is separable concave and f j(0) = 0.

Proof

Due to space limitation, we provide an abstract proof with the main ideas here. Firstly, because the objective function is continuous, and the domain is compact, the optimum value F(c) is also continuous. Secondly, negative coefficient c i would yield x i = 0, so we only need to focus in the nonnegative domain \(c\in \Re _+^N\). Lastly, we only need to prove that for any given \(c\in \Re _+^N\) and two different indices i, j ∈ N, if u i and v j are nonnegative vectors with only positive values in index i and j, respectively, then F(C + u i) + F(C + v j) ≥ F(C) + F(C + u i + v j).

Now we can fix all but two dimensions i, j. We then segment the two dimensional space \((c_i,c_j)\in \Re _+^2\) into small grids by the values of other C k, k ≠ i, j. We only need to prove inside each grid since local submodularity implies global submodularity. Inside each small grid, the line c i = c j cuts the grid into two pieces, and by Theorem 12 there is a uniform optimum solution in each piece, as illustrated in Figure 2. We note the optimum solution in the left piece (c i ≤ c j) as x L, then \(F(c)=x_L^T(c-C)+F(C)\); also the optimum solution in the right piece (c i ≥ c j) is noted as x R, so \(F(c)=x_R^T(c-C)+F(C)\) when c i ≥ c j, inside this small grid.

Fig. 2
figure 2

Idea of proof for preservation of submodularity

Without losing generality, we set F(C) = 0, and assume C j ≥ C i. Note b = C + u i and a = C + v j, then C = a ∧ b and C + u i + v j = a ∨ b. If C i + |v j|≤ C j,then a, b are not separate by the line, and F(c) is the same linear function for a, b, a ∨ b, a ∧ b, so the submodularity directly follows. If C i + |v j| > C j, then a, b are in different piece, with \(F(b)=x_R^T(b-C)\ge x_L^T(b-C)\) and \(F(a)=x_L^T(a-C)\ge x_R^T(a-C)\). The line c i = c j intersects line from C = a ∧ b to b at z = (C j, C j) as in Figure 2. Note that a ∧ b = a ∧ z, so we have

$$\displaystyle \begin{aligned}F(a)+F(z)=x_L^T(a-C)+x_L^T(z-C)=x_L^T(a\wedge z-C+a\vee z-C)=F(a\wedge b)+F(a\vee z).\end{aligned}$$

Because z = (a ∨ z) ∧ b and a ∨ b = (a ∨ z) ∨ b, we have

$$\displaystyle \begin{aligned}F(a\vee z)+F(b)\ge x_R^T(a\vee z-C)+x_R^T(b-C)=x_R^T\left[(a\vee z)\wedge b+a\vee b-2C\right]=F(a\wedge b)+F((a\vee z)\vee b)=F(z)+F(a\vee b).\end{aligned}$$

Adding these two inequalities up, we obtain

$$\displaystyle \begin{aligned}F(a)+F(b)\ge F(a\wedge b)+F(a\vee b).\end{aligned}$$

Therefore F(a) is submodular in \(\Re ^N\).

For set function \(\hat F(S)\), note that \(\hat F(S)=F(c\mid S)\), where (cS)i = c i if i ∈ S and 0 if else. The submodularity of \(\hat F\) then directly follows from submodularity of F. For the proof of separable objective functions, please refer to Theorem 3 in [15].

3.3 Minimization of Submodular Set Function

In this subsection, we discuss how to solve submodular set function minimization problems. It relies on the fact that minimizer of the Lovasz extension can be reached at the extreme points of the polytope, which is a counter-intuitive result since this property holds mostly for concave functions instead of convex functions.

Theorem 14 (Minimization of Submodular Set Function)

If \(f:2^N\rightarrow \Re \) is a submodular set function, then the minimizer of its Lovasz extension in domain [0, 1]N can be obtained at vertex points: \(\min _{x\in [0,1]^N} f^L(x)=\min _{S\subseteq N} f(S)\).

Proof

By submodularity and Theorem 10, f L = f .

$$\displaystyle \begin{aligned}\begin{array}{rrlrl} \min_{x\in [0,1]^N} f^L(x)&=\min_{x\in [0,1]^N}\min &\sum_{S\subseteq N} \lambda_Sf(S)&=\min &\sum_{S\subseteq N} \lambda_Sf(S)\\ &{\mathbf s.t.} &\sum_{S:i\in S} \lambda_S=x_i, \forall i &{\mathbf s.t.} &0\le \sum_{S:i\in S} \lambda_S\le 1, \forall i\\ && \sum_S \lambda_S=1&& \sum_S \lambda_S=1\\ &&\lambda_S\ge 0 \forall S &&\lambda_S\ge 0 \forall S. \end{array}\end{aligned}$$

Notice that 0 ≤∑S:iS λ S ≤ 1 follows from the fact that ∑S λ S = 1 and λ S ≥ 0,

$$\displaystyle \begin{aligned}\min_{x\in [0,1]^N} f^L(x)=\min\{\sum_{S\subseteq N} \lambda_Sf(S)\mid \sum_S \lambda_S=1, \lambda\ge 0 \}=\min_{S\subseteq N} f(S).\end{aligned}$$

By Theorem 14, if we can find an optimum solution for \(\min \{f^L(x)\mid x\in [0,1]^N\}\), it corresponds to the optimum solution of the discrete problem \(\min \{f(S)\mid S\subseteq N\}\). For convex optimization problem \(\min \{f^L(x)\mid x\in [0,1]^N\}\), we can evaluate the value and subgradient of f L(x) at x by the linear programming formulation and its dual in proof of Theorem 14. The exact algorithms for submodular function minimization are quite extensive, interested readers can refer to Section 10.2 of [23], or the research papers [5, 14, 17, 18, 25]. In particular, Schrijver’s algorithm [25] achieves O(n 5) iterations, with O(n 7) function evaluation and O(n 8) arithmetic operations (see Yvgen [31]), and the improved Iwata–Fleischeer–Fugishige’s algorithm [17] can solve the problem within \(O(n^7\ln n)\) function evaluation and arithmetic operations.

In practice, speed of the algorithm is often an important factor, while the precision can be sacrificed for speed. Next, we introduce a fast algorithm based on subgradient method to optimize f L(x) within high precision. After obtaining a high quality solution x ∈ [0, 1]N for f L(x), by the definition of Lovasz extension, we can identify at most N + 1 set S j such that f L(x) is the convex combination of f(S j). Therefore, minj f(S j) ≤ f(x). We introduce the classical result of gradient projection method in the following theorem:

Theorem 15 (Gradient Projection Method)

Suppose \(g:X\rightarrow \Re \) is a convex function defined on closed convex set X with diameter R. If we apply the gradient projection method: x t+1 = (x t − α t d t)∣X , where d t is a subgradient of g at x t whose length is uniformly upper bounded by G, α t ≥ 0 is the step length, and yX is the projection of y in convex set X defined as yX = argmin{∥z  y∥∣z  X}. Then

$$\displaystyle \begin{aligned}\min_{t\le T}[g(x_t)-g(x^*)]\le \frac{G^2(\sum_{t=1}^{T}\alpha_t^2)+R^2}{2\sum_{t=1}^{T}\alpha_t}.\end{aligned}$$

In this error bound estimation, taking \(\alpha _t=\frac {R}{G}\frac {1}{\sqrt {T}}\) for fixed horizon T would yield upper bound \(\frac {RG}{\sqrt {T}}\) , and taking horizon independent step length \(\alpha _t=\frac {R}{G}\frac {1}{\sqrt {t}}\) would yield upper bound \(\frac {RG(1+\ln T)}{2\sqrt {T}}\).

Proof

Suppose x ∈ X is the optimum solution, then

$$\displaystyle \begin{aligned}\begin{array}{rll} &\|x_{t+1}-x^*\|{}^2&\\ \le &\|x_t-\alpha_t d_t-x^*\|{}^2&\longleftarrow (x_t-\alpha_t d_t-x_{t+1})^T(y-x_{t+1})\le 0 \forall y\in X\\ \le&\|x_t-x^*\|{}^2-2\alpha_td_t^T(x_t-x^*)+\alpha_t^2G^2& \longleftarrow \|d_t\|\le G\\ \le&\|x_t-x^*\|{}^2-2\alpha_t[g(x_t)-g(x^*)]+\alpha_t^2G^2&\longleftarrow \mbox{ by convexity }g(x^*)\ge g(x_t)+d_t^T(x^*-x_t).\\ \end{array}\end{aligned}$$

Therefore,

$$\displaystyle \begin{aligned}2\alpha_t[g(x_t)-g(x^*)]\le \alpha_t^2G^2+\|x_t-x^*\|{}^2-\|x_{t+1}-x^*\|{}^2.\end{aligned}$$

Sum these inequalities up, we have

$$\displaystyle \begin{aligned}\left(2\sum_{t=1}^{T}\alpha_t\right)\min_{t\le T}[g(x_t)-g(x^*)]\le \sum_{t=1}^{T}2\alpha_t[g(x_t)-g(x^*)]\le G^2\left(\sum_{t=1}^{T}\alpha_t^2\right)+R^2.\end{aligned}$$

More general cases have been studied by Haucbaum et al. [16].

Problem 3

$$\displaystyle \begin{aligned}\begin{array}{rl} \min &f(S)\\ {\mathbf{s.t.}} &a_{ij}x_i + b_{ij}x_j \ge c_{ij} \quad \text{for all }(i, j)\in A \\ & S\subseteq N, \end{array}\end{aligned}$$

where x = 1 S is the characteristic vector of S, A is a set of pairs (allowing multiple copies of the same pair), and \(f:2^N\rightarrow \Re \) is a submodular set function.

The following result has been established by Haucbaum et al. [16]:

Theorem 16

If f is submodular and constraints are monotone (feasible set is a lattice), then it’s (strongly) polynomial-time solvable. If f is nonnegative submodular, and the constraints satisfy round up property or f is monotone, then it’s 2-approximable in polynomial time.

Proof

Firstly, we preprocess all the constraints. Note that all variables are {0, 1} variables. We first remove all redundant constraints. If a constraint a ij x i + b ij x j ≥ c ij implies x i or x j equals to a certain value, then we can replace this constraint by two single dimensional constraints, which are either redundant or can be removed by fixing the variable. Repeatedly simplifying all such constraints, the left over constraints with two variables would all be of the form x i ≥ x j, x i + x j ≤ 1, or x i + x j ≥ 1. Furthermore, constraints of type a ij x i − b ij x j ≥ c ij, where a ij, b ij ≥ 0, would be reduced to simple single dimensional constraints, or the constraints of type x i ≥ x j (or x i ≤ x j), constraints of type a ij x i − b ij x j ≥ c ij, where a ij ≥ 0, b ij ≤ 0, would be reduced to simple single dimensional constraints, or the constraints of type x i + x j ≤ 1, or x i + x j ≥ 1. If there is a group of cyclic constraints \(x_{i_1}\ge x_{i_2}\ge \cdots x_{i_n}\ge x_{i_1}\), we further simplify it by replacing all \(x_{i_j}\) with a single variable.

When all the constraints are monotone, the constraints after simplification would all be the form x i ≥ x j for directed pairs (i, j) ∈ E. The problem now reduces to submodular minimization over a ring, which is solvable in (strong) polynomial time in the size of the underlying graph. A simple explanation is that, we can reform the problem into minimization of another submodular set function over set 2E. For this purpose, for constraint x i ≥ x j we define variable y ij = 1 if x i = 1 and x j = 0, and y ij = 0 if else. And define base set B of indices as those indices never appear in the left side of ≥ constraints, and we define y i = x i for all i ∈ B. Now, each x i can be defined by \(\vee _{k\in S_k}y_k\) for certain set S i ⊆ E ∪ B (basically, in the ordered graph, S i is the set children edges of i, as well as the leaves grow from node i). It can be easily proved that for a monotone set function F, and set T ⊆ E ∪ B, define set S(T) = {iS i ∩ T ≠ ∅}, then function G(T) = F(S(T)) is also submodular for submodular function f. And the constraint for original variables is embedded in the transformation S(T), so the constraint for function G becomes T ⊆ E ∪ B.

For general cases, suppose the problem after simplification is of form:

$$\displaystyle \begin{aligned}\begin{array}{rl} \min &f(x)\\ {\mathbf{s.t.}} &x_i\ge x_j, \forall (i,j)\in E, \\ &x_i+x_j\ge 1, \forall (i,j)\in U, \\ &x_i+x_j\le 1, \forall (i,j)\in V. \end{array}\end{aligned}$$

We introduce two copies of original variables x + = x ∈{0, 1}n, x  = −x ∈{0, −1}n. Then the original problem can be reformulated by

$$\displaystyle \begin{aligned}\begin{array}{rl} \min &\frac{f(x^+)+f(-x^-)}{2}\\ {\mathbf{s.t.}} &x_i^+\ge x_j^+, x_i^-\le x_j^-\forall (i,j)\in E, \\ &x_i^+-x_j^-\ge 1, -x_i^-+x_j^+\ge 1, \forall (i,j)\in U, \\ &x_i^+-x_j^-\le 1, -x_i^-+x_j^+\le 1, \forall (i,j)\in V,\\ &x_i^++x_i^-=0, \forall i\\ &x_i^+\in \{0,1\}, x_i^-\in \{0,-1\}, \forall i. \end{array}\end{aligned}$$

Dropping the only nonmonotone constraints \(x_i^++x_i^-=0\), we obtain a relaxed problem with only monotone constraints, which can be solved exactly. Suppose optimum solution is (x +, x ) with objective value \(V^*\le {\mathcal OPT}\). However, notice that \(y=\lceil \frac {x^+-x^-}{2}\rceil \) and \(z=\lfloor \frac {x^+-x^-}{2}\rfloor \) are both feasible for the original problem. However, y = x + ∨ (−x ) and z = x + ∧ (−x ). By submodularity, f(y) + f(z) ≤ f(x +) + f(−x ). Because f is nonnegative, \(f(y)\le f(x^+)+f(-x^-)=2V^*\le 2{\mathcal OPT}\).

3.4 Maximization of Submodular Set Function

There are many scenarios where one needs to maximize a submodular set function. For example, consider a social network where people’s decision is influenced by their friends. When a company needs to place a number of individual advertisements, e.g., via phone calls, a crucial problem is which group of people should they reach to maximize the total effect, within given budget constraint. A simplified model is the so-called Max-k-Cover problem:

Problem 4 (Max-k-Cover)

Given a set of sets {S j ⊆ Nj ∈ A}, find k sets which covers the most number of elements.

We can also assume that each element (customer) covered has a different value:

Problem 5 (The Maximum Coverage Problem)

Given a set of S 1, S 2, ⋯ , S m ⊆ N. For each element i ∈ N, it has a value v i ≥ 0, and for each set S ⊆ N the value function is defined as V (S) =∑iS v j. We need to select K sets {S ij ∈ A}, and to maximize the maximum value V (⋃jA S j).

One of the most important characteristic of these problem is the submodularity of objective function, with respect to the selected set of sets. The proof is straightforward and is omitted here.

Proposition 1

We define U(A) = V (⋃jA S j), then U is a submodular set function:

$$\displaystyle \begin{aligned}V(\cup_{i\in A} S_j)+V(|\cup_{i\in B} S_j|)\ge V(|\cup_{i\in A\cap B} S_j|)+V(|\cup_{i\in A\cup B} S_j|) \mathit{\mbox{ for any }}A,B\subseteq \{1,2,\cdots, m\}.\end{aligned}$$

A related problem arises from application is assort optimization, where one needs to place advertisements of goods on the front-page of its website for maximum sales effect.

Problem 6 (Assortment Optimization)

There are K advertisement slots of a webpage, which we need to select from a set N of goods from a certain category. The goods are substitutable to each other, that is, increasing sales from one product would hurt (or has no effect to) sales of the other product, so the more the goods placed on the webpage, the lesser the contribution from the advertisement of the next goods added. In some classical literatures, e.g., [6], the total sales revenue V (S) from displacement of set S of goods on the webpage is assumed to be increasing and submodular. And we aim to solve the cardinality constrained maximization problem:

$$\displaystyle \begin{aligned}\max\{V(S): |S|\le K, S\subseteq N\}.\end{aligned}$$

Because Max-Cut problem is well-known to be NP-hard, and the cut weight V (S) =∑iS,jS w ij is submodular in S, submodular set function maximization with cardinality constraint is also NP-Hard. The hardness to approximate result has been established by Feige [11]:

Theorem 17 (Max-Hardness)

Consider cardinality constrained submodular maximization problem \(\max \{f(S), |S|\le K, S\subseteq N\}\) for rank function (submodular, normalized, and increasing) \(f:2^N\rightarrow \Re \) . Unless P = NP, there is no polynomial-time algorithm which achieves approximation ratio strictly better than \(1-\frac {1}{e}\) in general (for general setting of K).

There is a simple greedy algorithm which can achieve the best possible approximation ratio \(1-\frac {1}{e}\) for cardinality constrained maximization problem of rank functions.

Algorithm 2: Greedy algorithm: rank function maximization

Theorem 18 (Greedy for Rank Function)

The greedy algorithm above achieves approximation ratio \(1-(1-\frac {1}{K})^K\ge 1-\frac {1}{e}\) for \(\max \{f(S), |S|\le K, S\subseteq N\}\) , if the function f is a rank function.

Proof

Define the optimum solution as S , and optimum value \({\mathcal OPT}=f(S^*)\). For any set S ⊆ N, note that elements in S S as {j 1, j 2, ⋯ , j m}, and S k = S ∪{j 1, j 2, ⋯ , j k}. Then

$$\displaystyle \begin{aligned}\sum_{i\in S^*} [f(S+\{i\}-f(S)]=\sum_{k=1}^m [f(S+\{j_k\}-f(S)]\ge \sum_{k=1}^m [f(S^k)\}-f(S^{k-1})]= f(S\cup S^*)-f(S).\end{aligned}$$

Due to monotonicity of f, we have \(f(S\cup S^*)\ge f(S^*)={\mathcal OPT}\). Consequently,

$$\displaystyle \begin{aligned}\max\{f(S+\{i\})-f(S): i\in S^*\}\ge \frac{1}{K}({\mathcal OPT}-f(S)).\end{aligned}$$

Therefore, for any t and set S t, the greedy algorithm outputs set S t+1:

$$\displaystyle \begin{aligned}{\mathcal OPT}-f(S_{t+1})\le {\mathcal OPT}-f(S_{t})-\left[f(S_{t+1})-f(S_{t})\right]\le {\mathcal OPT}-f(S_{t}) - \frac{1}{K}\left[{\mathcal OPT}-f(S)\right]\le \left(1-\frac{1}{K}\right)\left[{\mathcal OPT}-f(S_{t})\right].\end{aligned}$$

This implies that

$$\displaystyle \begin{aligned}{\mathcal OPT}-f(S_{K})\le \left(1-\frac{1}{K}\right)^K\left[{\mathcal OPT}-f(S_{0})\right]\le \left(1-\frac{1}{K}\right)^K{\mathcal OPT},\end{aligned}$$

and

$$\displaystyle \begin{aligned}f(S_{K})\ge \left[1-\left(1-\frac{1}{K}\right)^K\right]{\mathcal OPT}\ge \left(1-\frac{1}{e}\right){\mathcal OPT}.\end{aligned}$$

The \(1-\frac {1}{e}\)-approximation is tight for rank function due to Theorems 17 and 18. In the remainder of this subsection, we discuss a more general case by relaxing the monotonicity assumption of objective function.

Problem 7 (Nonmonotone Submodular Function Maximization)

Given a nonnegative submodular set function f : 2N →  +, suppose we can evaluate f(S) for any S ⊆ N. How should we solve the problem:

$$\displaystyle \begin{aligned}\max\{f(S): S\subseteq N\}.\end{aligned}$$

The hardness to approximate result is established in [12]:

Theorem 19 (Hardness for Nonmonotone Submodular Function Maximization)

Suppose f : 2N → ℜ + is a submodular set function, which we can evaluate the function value on each S  N. Then for any 𝜖 > 0, an algorithm which can approximate the general maximization problem of approximation ratio \(\frac {1}{2}+\epsilon \) needs to call the valuation oracle exponentially many times. This is also true even if f is known to be symmetric, i.e., f(S) = f(NS).

Buchbinder et al. [3] recently established the tight approximation algorithm, based on the idea of forward–backward greedy search:

Algorithm 3: 1∕2-Randomized approximation algorithm

This algorithm maintains increasing random series of sets {A t} and decreasing series of sets {B t}, by gradually deciding whether an element should be added to A t, or removed from B t, based on whether its potential is improving the function value. It stops at A N = B N. Next, we define a series of sets S t to assist our analysis of the algorithm. Suppose the optimum solution of \(\max \{f(S): S\subseteq N\}\) is S , with the optimum value noted as \({\mathcal OPT}=f(S^*)\). We define the random set S t = (S ∪ A t) ∩ B t and value V t = E[f(S t)]. Then we have for all t, A t ⊆ S t ⊆ B t, S 0 = S , \(f(S_0)={\mathcal OPT}\), and S N = A N = B N.

To prove the approximation result, we quantify the potential loss of function value from S t−1 to S t by the following technical lemma from [30]:

Lemma 2

For any t, the algorithm outputs

$$\displaystyle \begin{aligned}E[f(S_{t-1})-f(S_t)]\le \frac{1}{2}\left[f(A_t)-f(A_{t-1})+f(B_t)-f(B_{t-1})\right].\end{aligned}$$

Proof

By definition we have B t − A t = {u t+1, u t+2, ⋯ , u N}, and u t ∈ B t−1A t−1. If a t = b t = 0, then by definition f(A t−1 ∪{u t}) − f(A t−1) ≤ 0, f(B t−1 ∖{u t}) − f(B t−1) ≤ 0. Note that A t−1 ⊆ B t−1 ∖{u t}, it follows from submodularity we have

$$\displaystyle \begin{aligned}0\ge f(A_{t-1}\cup \{u_t\})-f(A_{t-1})\ge f(B_{t-1})-f(B_{t-1}\setminus \{u_t\})\ge 0.\end{aligned}$$

Notice the algorithm outputs A t = A t−1 ∪{u t}, B t = B t−1, so f(A t) − f(A t−1) = f(B t−1) − f(B t) = 0. If u t ∈ S t−1, we have S t = S t−1 and f(S t−1) − f(S t) = 0. If u tS t−1, then the algorithm outputs S t = S t−1 ∪{u t}, consequently A t−1 ⊆ S t−1 ⊆ B t−1 ∖{u t}. By submodularity we have

$$\displaystyle \begin{aligned}0\ge f(A_{t-1}\cup \{u_t\})-f(A_{t-1})\ge f(S_{t-1}\cup \{u_t\})-f(S_{t-1})\ge f(B_{t-1})-f(B_{t-1}\setminus \{u_t\})\ge 0,\end{aligned}$$

which implies that f(S t−1) − f(S t) = f(A t) − f(A t−1) = f(B t−1) − f(B t) = 0.

Now we consider the case a t + b t > 0. If u t ∈ S , then u t ∈ S t−1, S t = S t−1 with probability \(p_t=\frac {a_t}{a_t+b_t}\), and S t = S t−1∖{u t} with probability 1 − p t. Note that A t−1 ⊆ S t−1∖{u t}, by submodularity f(S t−1) − f(S t−1∖{u t}) ≤ f(A t−1 ∪{u t}) − f(A t−1) = a t. Therefore

$$\displaystyle \begin{aligned}E[f(S_{t-1})-f(S_t)]\le (1-p_t)a_t=\frac{a_tb_t}{a_t+b_t}.\end{aligned}$$

If u tS , then u tS t−1, S t = S t−1 ∪{u t} with probability \(p_t=\frac {a_t}{a_t+b_t}\), and S t = S t−1 with probability 1 − p t. Because S t−1 ⊆ B t−1∖{u t}, it follows from submodularity that f(S t−1) − f(S t−1 ∪{u t}) ≤ f(B t−1∖{u t}) − f(B t−1) = b t. Therefore we also have

$$\displaystyle \begin{aligned}E[f(S_{t-1})-f(S_t)]\le p_tb_t=\frac{a_tb_t}{a_t+b_t}.\end{aligned}$$

Note that p t = 0 if a t = 0, b t > 0, p t = 1 if a t > 0, b t = 0, so whenever a t + b t > 0,

$$\displaystyle \begin{aligned}\begin{array}{rl}&f(A_t)-f(A_{t-1})+f(B_t)-f(B_{t-1})\\=&p_t\left[f(A_{t-1}\cup\{u_t\})-f(A_{t-1})\right]+(1-p_t)\left[f(B_{t-1}\setminus \{u_t\})-f(B_{t-1})\right]\\ =&p_t\left[f(A_{t-1}\cup\{u_t\})-f(A_{t-1})\right]_++(1-p_t)\left[f(B_{t-1}\setminus \{u_t\})-f(B_{t-1})\right]_+\\ =&p_ta_t+(1-p_t)b_t.\end{array}\end{aligned}$$

Therefore,

$$\displaystyle \begin{aligned} \frac{a_tb_t}{a_t+b_t}&\le \frac{a_t^2+b_t^2}{2(a_t+b_t)}=\frac{p_ta_t+(1-p_t)b_t}{2}\\ &=\frac{1}{2}\left[f(A_t)-f(A_{t-1})+f(B_t)-f(B_{t-1})\right]. \end{aligned} $$

Theorem 20 (\(\frac {1}{2}\)-Approximation)

If the function f : 2N → ℜ + is a nonnegative submodular set function, then Algorithm 3 achieves \(\frac {1}{2}\) -approximation ratio, i.e.,

$$\displaystyle \begin{aligned}E[f(A_N)]\ge \frac{1}{2}{\mathcal OPT}.\end{aligned}$$

Proof

Adding the inequalities in Lemma 2 for all t = 1, 2, ⋯ , N, we obtain

$$\displaystyle \begin{aligned} E[f(S_0)]-E[f(S_N)]&=\sum_{t=1}^NE[f(S_{t-1})-f(S_t)]\\ &\le \frac{1}{2}\left[f(A_N)-f(A_0)+f(B_N)-f(B_0)\right]. \end{aligned} $$

It then follows from S N = A N = B N, and f(A 0), f(B 0) ≥ 0 that

$$\displaystyle \begin{aligned}E[f(S_0)]-E[f(S_N)]\le \frac{1}{2}\left[f(A_N)+f(B_N)\right]=f(S_N).\end{aligned}$$

Because S 0 = S , \(f(A_N)=f(S_N)\ge \frac {1}{2}f(S_0)=\frac {1}{2}{\mathcal OPT}\).

3.5 Multi-Linear Relaxation and Submodular Function Maximization

In this section, we introduce another line of approach to deal with submodular function maximization problems, which utilize the so-called multi-linear relaxation.

Definition 8 (Multi-Linear Relaxation)

Given a set function \(f:2^N\rightarrow \Re \), we define its multi-linear relaxation by rounding a continuous point x ∈ [0, 1]N to {0, 1}N: F(x) = E[f(ξ(x))], where \(\xi (x)\in \Re ^N\) takes value ξ(x)i = 1 with probability x i, and ξ(x)i = 0 with probability 1 − x i independently.

Because the multi-linear relaxation is defined via expectation, it is straightforward to see:

Theorem 21

$$\displaystyle \begin{aligned}\max\{F(x)\mid x\in [0,1]^N\}=\max\{f(x)\mid x\in \{0,1\}^N\}.\end{aligned}$$

In the remainder of the section, we introduce variations of submodular maximization problem, and how to utilize the multi-linear relaxation for solving these problems. We start with the general matroid constrained problem:

Definition 9 (Matroid)

A matroid M = (X, I) consists of the ground set X and the independent set I ⊆ 2X which is a set of subsets of X, if it satisfies the following:

  1. 1.

    For any A ⊆ B and B ∈I, it has to be A ∈I.

  2. 2.

    For any A, B ∈I and |A| < |B|, there exists x ∈ B ∖ A such that A ∪{x}∈I.

Matroids are discrete sets, whose convex hull are actually polymatroids. In the following, we first illustrate how matroid induces a rank function, and how this rank function defines a polymatroid which is the convex hull of the matroid.

Theorem 22 (Matroid Rank)

Define \(r(S)=\max \{|X|\mid X\subseteq S, X\in \mathbf {I}\}\) , if M is a matroid, then r(S) is a rank function.

Proof

It follows from definition that r(S) is monotonically increasing and r(∅) = 0, so we only need to verify the submodularity. For any i, jS, if r(S ∪{i, j}) = r(S ∪{i}) or r(S ∪{i, j}) = r(S ∪{j}), it follows from monotonicity that r(S) + r(S ∪{i, j}) ≤ r(S ∪{i}) + r(S ∪{j}). If else, then r(S ∪{i, j}) > r(S ∪{i}) and r(S ∪{i, j}) > r(S ∪{j}). Define . Note that r(S ∪{i, j}) = |A| and A ∖ j ⊆ S ∪{i}, by the definition of independent set A ∖ j ∈I, so r(S ∪{i}) ≥|A|− 1 = r(S ∪{i, j}) − 1. Because r(S ∪{i}) < r(S ∪{i, j}), we have r(S ∪{i}) = r(S ∪{i, j}) − 1. Similarly, r(S ∪{j}) = r(S ∪{i, j}) − 1.

Define . Because |B| = r(S) ≤ r(S ∪{i}) < r(S ∪{i, j}) = |A|, it follows from definition of independent set that there exists x ∈ A ∖ B with B ∪{x}∈I. By the definition of B, xS because otherwise B ∪{x}⊆ S is a larger independent set in S. Therefore, x ∈ (S ∪{i, j}) ∖ S = {i, j}, so x = i or x = j. If x = i, then r(S ∪{i}) ≥|B + i| = |B| + 1 = r(S) + 1. Similarly, if x = j, we also have r(S ∪{j}) ≥ r(S) + 1 if x = j. Therefore, we always have

$$\displaystyle \begin{aligned}r(S\cup\{i\})+r(S\cup\{j\})\ge r(S)+1+r(S\cup\{i,j\})-1=r(S)+r(S\cup\{i,j\}).\end{aligned}$$

In linear algebra, the set of linearly independent vectors forms an independent set for ground set of all vectors in \(\Re ^N\). The rank function induced by this independent set is exactly the rank of the spanning space of a set of vectors.

Theorem 23 (Matroid to Polymatroid)

For a matroid M with induced rank function r, define polytope P(M) = Conv{1 SS I}, then

$$\displaystyle \begin{aligned} \mathbf{P}(\mathbf{M})=\mathbf{P}(r,X)=\left\{x\in \Re_+^X\mid \sum_{j\in S}x_j\le r(S)\ \forall S\subseteq X\right\}\end{aligned}$$

and it is a polymatroid.

Proof

For any independent set A ∈I and S ⊆ X, it follows from A ∩ S ∈I that r(S) ≥|A ∩ S| =∑iS(1 A)i. Therefore M ⊆P(M).

Reversely, in the proof of Theorem 23 we showed that r(S ∪{i}) − r(S) = 0 or 1. By the optimum solution structure in Theorem 12, all the vertices of polytope P(M) are 0∕1 vector. Suppose one vertex is v = 1 A, which corresponds to set A. If A is not an independent set, then by definition r(A) ≤|A|− 1. However, ∑iA v i =∑iA1 = |A| > r(A), which contradicts the constraint in the definition of P(M). Therefore, any vertices of the polytope are an element in M.

For the matroid constrained rank function maximization problem:

$$\displaystyle \begin{aligned}\max\{f(S): S\in \mathbf{I}\},\end{aligned}$$

where f : 2N →  + is a rank function and M = (N, I) is a matroid, we introduce the algorithm in [30]. Firstly, they use the smooth-greedy algorithm to obtain solution x such that \(F(x)\ge \left (1-\frac {1}{e}-o(1)\right ){\mathcal OPT}\), then they apply pipage rounding to gradually round each indices to 0 or 1. Since the multi-linear extension is defined by expectation form, rounding (or even greedy) would naturally yield integer solution with better quality. To start with, we consider the smooth process:

Algorithm 4: Smooth differential equation

The step 2 of solving I(y) is doable because it is equivalent to polymatroid optimization problem as in Section 3.2, which can be solved by simple greedy process.

Theorem 24 (Smooth Process)

For the problem \(\max \{f(S): S\in \mathbf {I}\}\) with rank function r and polymatroid M = (N, I), the smooth process outputs

$$\displaystyle \begin{aligned}F(y(1))\ge \left(1-\frac{1}{e}\right){\mathcal OPT}.\end{aligned}$$

Proof

Firstly, because \(0\le \frac {\partial }{\partial t}y_j(t))\le 1\) for any index j ∈ N, y(t) is always feasible for t ∈ [0, 1]. Define the optimum solution as S  = argmax{f(S) : S ∈I}, then \(f(S^*)={\mathcal OPT}\). For any y ∈ [0, 1]N, define random set R y by independently randomly selecting index i ∈ N with probability y i, and not selecting i with probability 1 − y i.

For any two sets S, T ⊆ N, we define f S(T) = f(S + T) − f(S), and f S(j) = f S({j}). By submodularity, for any set S ⊆ N we have:

$$\displaystyle \begin{aligned}OPT=f(S^*)\le f(S\cup S^*)\le f(S)+\sum_{j\in S^*}f_S(j).\end{aligned}$$

Define \(f_{R_y}(j)=E_{S\sim R_y}f_S(j)\) and notice that \(F(y)=E_{S\sim R_y}f(S)\), then

$$\displaystyle \begin{aligned} OPT&\le E_{S\sim R_y}\left[f(S)+\sum_{j\in S^*}f_{S}(j)\right]\\ &=F(y)+\sum_{j\in S^*}f_{R_y}(j)\le F(y)+\max_{S\in \mathbf{I}}\sum_{j\in S}f_{R_y}(j), \end{aligned} $$

where the last inequality follows from the fact that S I. Note that

$$\displaystyle \begin{aligned}F(y)=\sum_{S\subseteq N}f(S)\prod_{i\in S}y_i\prod_{i\notin S}(1-y_i),\end{aligned}$$

which implies that

$$\displaystyle \begin{aligned} \frac{\partial}{\partial y_j}F(y)&=F(y\mid y_j=1)-F(y\mid y_j=0)\\ &= E\left[f(R_y\cup\{j\})-f(R_y\setminus\{j\})\right]\ge f_{R_y}(j). \end{aligned} $$

Therefore, the differential equation process satisfies

$$\displaystyle \begin{aligned} \frac{d}{dt}F(y(t))&=\sum_{j\in I(y)}\frac{\partial}{\partial y_j}F(y(t))\\ &=\max_{S\in \mathbf{I}}\sum_{j\in S}\frac{\partial}{\partial y_j}F(y(t))\ge \max_{S\in \mathbf{I}}\sum_{j\in S}f_{R_y}(j)\ge OPT-F(y(t)). \end{aligned} $$

Combine with the fact that \(F(y(0))\ge 0=\left (1-e^{-0}\right ){\mathcal OPT}\), we have for any t ∈ [0, 1],

$$\displaystyle \begin{aligned}F(y(t))\ge \left(1-e^{-t}\right){\mathcal OPT}.\end{aligned}$$

Since the smooth solution can’t be obtained exactly, one can apply the following algorithm for \(1-\frac {1}{e}-o(1)\) approximation ratio:

Algorithm 5: Smooth greedy

Many well-known combinatorial problems can be reformulated into matroid constrained rank function maximization:

Problem 8 (The Submodular Social Welfare Problem)

There are a set P (m many) of players and a set N of resources. Player i’s utility function is w i(S i) if receiving set S i of resources, which is assumed to be a rank function. How should we distribute resources among a group of people, to maximize the social utility \(\sum _{i=1}^m w_i(S_i)\)? Without losing generality, we assume that each resource is of single unbreakable unit, and this assumption can be relaxed to multi-units without altering the following process as well as its analysis.

By making m copies (i, j) of each item j, and an allocation {S 1, S 2, ⋯ , S m} uniquely corresponds to set \(S=\bigcup _{i=1}^m \{(i,j)\mid j\in S_i\}\). We obtain a matroid M is defined by the ground set X = P × N, the independent set

$$\displaystyle \begin{aligned}{\mathbf I}=\{S\subseteq {\mathbf X}\mid |S\cap \{P\times \{j\}\}|\le 1 \mbox{ for all } j\in N\}.\end{aligned}$$

Then the problem is reduced to classical matroid constraints rank function maximization.

When each player also faces the bin packing problem, the problem becomes the general assignment problem.

Problem 9 (General Assignment Problem)

There is a set P of players, and a set N of items. Each player i has only 1 unit of capacity which can’t be exceeded. Receiving the item j would yield utility v ij, but also consumes capacity c ij of the player i.

Note that each player has a feasible set \({\mathcal F}_i\subseteq 2^N\) of possible choices for each player i, we can construct the matroid X = (X, I) by ground set \({\mathbf X}=\{(i,S_i)\mid S_i\in {\mathcal F}_i, i\in P\}\), and

$$\displaystyle \begin{aligned}{\mathbf I}=\{{\mathbf S}\subseteq {\mathbf X}\mid \mbox{ At most one set }S_i\mbox{ assigned to each player } i\}.\end{aligned}$$

To avoid assigning one item to multiple players, the objective function is changed to

$$\displaystyle \begin{aligned}f({\mathbf S})=\sum_{j\in N}\max\{v_{ij}: j\in S_i, (i,S_i)\in {\mathbf S}\}.\end{aligned}$$

The GAP can also be solved via the so-called configuration LP approach as in [13], which plays significant role in combinatorial optimization.

Algorithm 6: Configuration LP+greedy rounding

Note that for general assignment problem, step 2 of the above algorithm can be solved by reformulating with a linear programming problem by assignment variables x ij for (continuous) amount of item j assigned to player i. Fleischer et al. [13] showed that this greedy rounding algorithm yields \(1-\frac {1}{e}\) approximation ratio:

Theorem 25

The configuration LP can be solved exactly, and the greedy rounding yields \(1-\frac {1}{e}\) approximation ratio with respect to the fractional solution.

Problem 10 (Budget Constrained Maximization)

Given a monotone submodular function f : 2N →  +, suppose we can evaluate f(S) for any S ⊆ N. And for each item i ∈ N it consumes nonnegative budget of c i. How should we solve budget constrained problem:

$$\displaystyle \begin{aligned}\max\left\{f(S): \sum_{i\in S}c_i\le B, S\subseteq N\right\}.\end{aligned}$$

The first \(1-\frac {1}{e}-o(1)\) approximation algorithm for the budget constrained maximization problem was achieved by Sviridenko [28], later improved by Badanidiyuru and Vondrák [2] and Ene and Nguyen [9]. The detailed algorithms are quite involved and lengthy, readers may refer to the listed research papers for reference. All algorithms split the items into two groups, those with large value and those with small value. The large valued ones are of small number, which can be guessed, or decided with the help of multi-linear extension form. For the small valued ones, missing one small valued items due to capacity would lose at most 𝜖 ratio. Therefore one can simply apply cost-efficiency greedy approach to fill the capacity.

4 Discrete Convexity in Dynamic Programming

Submodularity and other discrete convex properties are also very useful in dynamic and online decision problems. In Section 4.1, we present the concept of L -convexity. Applications in dynamic inventory control problems are discussed in Section 4.2. In Section 4.3, online matching problems are introduced.

4.1 L -Convexity

All the discussions of submodular function optimization in Section 3 are focused on set functions. In most practical problems, one needs to deal with decision variables in broader domain. Since the submodular function minimization relies heavily on Lovasz extension, a natural question is, when would the Lovasz extension of a function coincide with its convex hull in common discrete domain?

Definition 10 (L -Convex Set)

A set D ⊆Z N is called L -convex, if {(x, t)∣x − te ∈ D, t ∈Z +} is a sublattice, i.e.,

$$\displaystyle \begin{aligned}(x+te)\wedge y\in D \mbox{ and } x\vee (y-te)\in D \mbox{ for all } x,y\in D, t\in {\mathbf Z}_+,\end{aligned}$$

where \(e\in \Re ^N\) is the all one vector.

Definition 11 (L -Convex Function)

For L -convex domain D, we call a function \(f:L\rightarrow \Re \) a L -convex if the function g(x, t) = f(x − te) is a submodular function on sublattice domain {(x, t)∣x − te ∈ D, t ∈Z +}.

Theorem 26

The condition of L -convexity is equivalent to: (Condition A) f(x) + f(y) ≥ f((x + te) ∧ y) + f(x ∨ (y  te)) for any x, y  D, t Z + . When D = Z N , the next two conditions are also equivalent conditions for L -convexity:

  1. 1.

    (Condition B) \(f(x)+f(y)\ge f(\lfloor \frac {x+y}{2}\rfloor )+f(\lceil \frac {x+y}{2}\rceil )\) for any x, y  D.

  2. 2.

    (Condition C) If we define the Lovasz extension f L(x) within each integer grid, and merge them together, it is well defined and coincides with the convex hull: f L = f .

Note: when the function \(f:\Re ^N\rightarrow \Re \) is a C 2 function defined on continuous domain, the condition is equivalent to the Hessian of M = ∇2 f(x) that is always a diagonal dominated M-matrix for any x, i.e., M ij ≤ 0 for all i  j.

Proof

Firstly, for any x, y ∈ D, t ∈Z +, we denote z = x + te. Then (z, t) ∨ (y, 0) = (z ∨ y, t), (z, t) ∧ (y, 0) = (z ∧ y, 0), z ∨ y − te = x ∨ (y − te). Notice that

$$\displaystyle \begin{aligned} & f(x)+f(y)-f((x+te)\wedge y)-f(x\vee (y-te))\\ &=g(z, t)+g(y,0)-g((z, t)\wedge (y,0))-g((z, t)\vee (y,0)). \end{aligned} $$

So the condition (A) is equivalent to the submodularity of g.

Secondly, we show that condition (B) implies submodularity of g when the domain D = Z N, and vice versa. Note that we only need to verify the submodularity locally, i.e.:

  1. 1.

    g(x, t) + g(x + e i + e j, t) ≤ g(x + e i, t) + g(x + e j, t) for all i ≠ j, where e i is unit length vector which only takes value of 1 at index i,

  2. 2.

    g(x, t) + g(x + e i, t + 1) ≤ g(x + e i, t) + g(x, t + 1) for all x ∈ D, i ∈ N, and t ∈Z +.

The first inequality follows from

$$\displaystyle \begin{aligned} f(x{+}e_i{-}te){+}f(x+e_j-te)&\ge f(x-te+\lfloor\frac{e_i+e_j}{2}\rfloor)+f(x-te+\lceil\frac{e_i+e_j}{2}\rceil)\\ & =f(x-te)+f(x-te+e_i+e_j). \end{aligned} $$

The second inequality follows from

$$\displaystyle \begin{aligned} f(x{+}e_i{-}te){+}f(x-(t+1)e)& \ge f(x-te+\lfloor\frac{e_i-e}{2}\rfloor)+f(x-te+\lceil\frac{e_i-e}{2}\rceil)\\ &=f(x+e_i-(t+1)e)+f(x-te). \end{aligned} $$

Reversely, when g is submodular, we start with the case |x i − y i|≤ 1 for all i ∈ N, \(\lfloor \frac {x+y}{2}\rfloor =x\wedge y\), and \(\lceil \frac {x+y}{2}\rceil =x\vee y\). Therefore

$$\displaystyle \begin{aligned} f\left(\lfloor\frac{x+y}{2}\rfloor\right)+f\left(\lceil\frac{x+y}{2}\rceil\right)&=g(x\wedge y,0)+g(x\vee y,0)\\ &\le g(x,0)+g(y,0)= f(x)+f(y).\end{aligned} $$

Now we prove that condition (A) woud imply condition (B). If condition (B) is violated by some pair of (x, y), we define (x , y ) as the minimal pair which violates the condition (B), i.e., solution for

$$\displaystyle \begin{aligned}\min \{\|x-y\|{}_1\mid f(x)+f(y)< f\left(\lfloor\frac{x+y}{2}\rfloor\right)+f\left(\lceil\frac{x+y}{2}\rceil\right), x,y\in D'\},\end{aligned}$$

where we can constraint D is a finite box neighborhood D′ of a violation pair. For the inequality to hold, there exists at least one index k such that \(|x_k^*-y_k^*|\ge 2\). Without losing generality, we assume \(x_k^*\le y_k^*-2\). Note that for any x i, y i ∈Z, if x i ≤ y i − 1, then \(\min \{x_i+1,y_i\}=x_i+1\) and \(\min \{x_i,y_i-1\}=y_i-1\), if x i ≥ y i − 1, then \(\min \{x_i+1,y_i\}=y_i\) and \(\min \{x_i,y_i-1\}=x_i\). Therefore (x  + e) ∧ y  + x ∨ (y − e) = x  + y , and \(|((x^*+e)\wedge y^*)_i-(x^*\vee (y^*-e))_i|\le |x_i^*-y_i^*|\) for all index i. Furthermore,

$$\displaystyle \begin{aligned}|((x^*+e)\wedge y^*)_k-(x^*\vee (y^*-e))_k|\le |y_i^*-x_i^*-2|=y_k^*-x_k^*-2<y_k^*-x_k^*,\end{aligned}$$

which implies that ∥(x  + e) ∧ y − x ∨ (y − e)∥1 < ∥x − y 1. Because (x , y ) is the minimal pair which violates the condition, and the fact that (x  + e) ∧ y  + x ∨ (y − e) = x  + y , we have

$$\displaystyle \begin{aligned}f((x^*+e)\wedge y^*))+f(x^*\vee (y^*-e))\ge f(\lfloor\frac{x^*+y^*}{2}\rfloor)+f(\lceil\frac{x^*+y^*}{2}\rceil).\end{aligned}$$

However, it follows from condition (A) that

$$\displaystyle \begin{aligned}f(x^*)+f(y^*)\ge f((x^*+e)\wedge y^*))+f(x^*\vee (y^*-e)).\end{aligned}$$

These two inequalities contradict the definition of (x , y ), so we proved that there is no pair x, y ∈Z N which can violate condition (B).

Thirdly, we establish the equivalence of L -convexity with the convexity of Lovasz extension. When f is L -convex, it has been established that f is submodular in each small grid; therefore, we can define f L in each grid. Next we prove this definition coincides with the convex extension, by showing that for convex combinations of x =∑zD α z z, the minimum combination of function values can be achieved in the smallest grid near x, for any x with no integer value. For those x with integer value, i.e., within intersection of multiple small grids, we can apply continuity argument.

For each given finite convex combination x =∑zD λ z z with value V =∑zD α z f(z), the support {zα z > 0} is a finite set and can be assumed to be contained in a finite box B = [−M, M]N. Consider all convex combinations of x in B with better value, i.e., Λ = {λ∣∑zB λ z f(z) ≤ V, ∑zB λ z = 1, λ ≥ 0} which is nonempty because α ∈ Λ. Define the potential function \(P(\lambda )=\sum _{z\in B} \lambda _z \|z\|{ }_2^2\) for convex combination λ defined on B. And define β as the solution for \(\min \{P(\lambda )\mid \lambda \in \varLambda \}\), with support S upp β = {zβ z > 0}. If it is not contained in the smallest box, then there exists u, v ∈S upp β and index i such that v i − u i ≥ 2. It follows from the condition (A) that we can find w = (u + e) ∧ v, y = u ∨ (v − e) ∈ D, which satisfies f(u) + f(v) ≥ f(w) + f(y), u + v = w + y, and w, y ∈ B. Furthermore, for each index j, note that if u j ≥ v j − 1, then w j = v j and y j = u j, and if u j ≤ v j − 2, then w j = u j + 1 and y j = v j − 1, therefore \(u_j^2+v_j^2\ge w_j^2+y_j^2\) for all j ∈ N, and \(u_i^2+v_i^2-2\ge w_i^2+y_i^2\). Therefore, denoting \(\delta =\min \{\beta _u,\beta _v\}>0\), the convex combination \(\hat \beta \) defined as

$$\displaystyle \begin{aligned}\hat \beta_z=\left\{ \begin{array}{ll} \beta_z, & \mbox{ if } z\in B\backslash \{u,v,w,y\} \\ \beta_z-\delta, & \mbox{ if } z\in \{u,v\}\\ \beta_z+\delta, & \mbox{ if } z\in \{w,y\} \end{array} \right. \end{aligned}$$

satisfies

$$\displaystyle \begin{aligned}\begin{array}{l} \sum_{z\in B} \hat\beta_z z=\sum_{z\in B} \beta_z z+\delta(w+y-u-v)=\sum_{z\in B} \beta_z z=x\\ \sum_{z\in D} \hat\beta_zf(z)=\sum_{z\in D} \beta_zf(z)+\delta(f(w)+f(y)-f(u)-f(v))\le \sum_{z\in D}\\ \beta_zf(z){=}V P(\hat\beta){=}\sum_{z\in D} \hat\beta_z \|z\|{}^2{=}\sum_{z\in D} \beta_z\|z\|{}^2{+}\delta(\|w\|{}^2{+}\|y\|{}^2-\|u\|{}^2-\|v\|{}^2)\\ \le \sum_{z\in D} \beta_z\|z\|{}^2-2\delta\le P(\beta)-2\delta, \end{array} \end{aligned}$$

which contradicts with the minimum of potential function in the definition of β. Therefore, we showed that for the minimum convex combination β in the definition of f (x), z j = ⌊x j⌋, or ⌈x j⌉ for any z ∈S upp β and index j ∈ N. Therefore f L coincides with the f , which is well defined and convex.

Reversely, if f L = f , for any x, y ∈ D we have

$$\displaystyle \begin{aligned}f(x)+f(y)\ge 2f^-(\frac{x+y}{2})=2f^L(\frac{x+y}{2})=f(\lfloor\frac{x+y}{2}\rfloor)+f(\lceil\frac{x+y}{2}\rceil).\end{aligned}$$

For C 2 function f defined on continuous domain, note that for any \(x\in \Re ^N\) and t ∈  +,

$$\displaystyle \begin{aligned} &f(x+te_i)+f(x-te)-f(x+te_i-te)+f(x)\\ &=\int_{s=0}^t\int_{r=0}^t\sum_{j=1}^N\frac{\partial^2}{\partial x_i\partial x_j}f(x+se_i+re-te)dsdr, \end{aligned} $$

the submodularity of g across x i and t is equivalent to diagonal dominance of ∇2 f(x) on index i. Also, for any i ≠ j ∈ N,

$$\displaystyle \begin{aligned} & f(x+te_i)+f(x+te_j)-f(x)-f(x+te_i+te_j)\\ &=\int_{s=0}^t\int_{r=0}^t\sum_{j=1}^N\frac{\partial^2}{\partial x_i\partial x_j}f(x+se_i+re_j)dsdr, \end{aligned} $$

so the submodularity of g across x i and x j is equivalent to the off-diagonal (i, j)-th element of symmetric matrix ∇2 f(x) that is non-positive.

Theorem 27

For a L -convex function \(f:D\rightarrow \Re \) defined on L -convex domain D = Z N , any local minimum solution x, i.e., f(x) ≤ f(y) for all y  D such thaty  x≤ 1, is also a global minimum solution.

Proof

This result directly follows from the fact that x is local minimum for convex function f L. Alternatively, we can establish the result via combinatorial approach:

Define set S = {yf(z) < f(x), z ∈ D}. If S is nonempty, there exists y ∈ S (may be not unique) which is closest to x in L 1 distance. Because x is local optimum and f(x) > f(y), ∥y − x≥ 2. It follows \(2f(x)>f(x)+f(y)\ge f(\lfloor \frac {x+y}{2}\rfloor )+f(\lceil \frac {x+y}{2}\rceil )\) that \(\min \{f(\lfloor \frac {x+y}{2}\rfloor ), f(\lceil \frac {x+y}{2}\rceil )\}<f(x)\). But when ∥y − x≥ 2, both \(\lfloor \frac {x+y}{2}\rfloor \) and \(\lceil \frac {x+y}{2}\rceil \) are strictly closer to x than y. Therefore both \(\lfloor \frac {x+y}{2}\rfloor \) and \(\lceil \frac {x+y}{2}\rceil \) are not in S, which contradicts with the fact that \(\min \{f(\lfloor \frac {x+y}{2}\rfloor ), f(\lceil \frac {x+y}{2}\rceil )\}<f(x)\).

Because local minimum is global minimum, minimization of L -convex function can be achieved by local search of improving directions, readers may refer to [26] for details. When precision can be traded for speed, the gradient projection approach in Theorem 15 should be applied, if the effective domain is compact.

4.2 L -Convexity in Dynamic Inventory System

In this subsection, we introduce some important applications of submodular function optimization. In particular, many inventory related problems arise in supply chain management, where one needs to handle complex inventory system dynamically. Because inventory of different goods may substitute for each other, and inventory for perishable goods (e.g., fresh fruit, fresh milk, etc.) with different expiration dates is also substitutable for each other, we often model the problem by submodularity or other related properties, and utilize these properties for obtaining better inventory strategy.

There are many applications of L -convex function in dynamic inventory management. We start with a simple example. Suppose there is a retailer who has n classes of goods, while class i − 1 goods can be updated to class i (i = 2, 3, ⋯ , n) with upgrading cost c i per unit overnight. The retailer can also purchase from supplier for class 1 goods with cost c 1 per unit overnight. There are random demand \(D^t_i\) of type i goods at day t, and unsatisfied demand will be backlogged (booked for future sales) with penalty cost b i per unit day. Unsold class i goods will incur holding cost h i per unit day. The retailer needs to decide the amount \(q_1^t\) of class 1 goods to purchase from supplier, as well as the amount \(q_i^t\) upgraded for class i goods from class i − 1, i = 2, ⋯ , n. Denote the inventory of class i goods at the beginning of day t as \(I_i^t\), then \(I_i^{t+1}=I_i^t+q_{i}^t-D_i^t-q_{i+1}^t\). Therefore the decision corresponds to dynamic programming:

$$\displaystyle \begin{aligned}V_t(I^t)=E_{D_t}\min\left\{\sum_{i=1}^nc_iq_i^t+\sum_{i=1}^n f_i(I_i^t-D_i^t)+V_{t+1}(I^{t+1}): 0\le q_i^t\right\},\end{aligned}$$

where f i(x) = −b i x if x ≤ 0 and f i(x) = h i x if x ≥ 0, V T+1 ≡ 0, and the overnight decision q t depends on the realized demand D t during the day. It we take the transformation of \(S_i^t=\sum _{j\ge i}I_i^t\), then \(S_i^{t+1}=S_i^t+q_{i}^t-\sum _{i\ge s}D_i^t\), and \(I_i^t=S_i^t-S_{i+1}^t\). We define C t(S t) = V t(I t), which satisfies the dynamic programming:

$$\displaystyle \begin{aligned} C_t(S^t)&=E_{D_t}\min\left\{\sum_{i=1}^nc_i\left(S_i^{t+1}-S_i^t-\sum_{i\ge s}D_i^t\right)+\sum_{i=1}^n f_i(S_i^{t+1}-S_{i+1}^{t+1})\right.\\ &\qquad \qquad \quad \left.+\,V_{t+1}(I^{t+1}): S_i^{t+1}\ge S_i^t+\sum_{j\ge i}D_i^t\right\}. \end{aligned} $$

Next, we show that the function C is always L -convex, and the function V  is a so-called multimodular function. For this purpose, we need to establish that L -convexity can be preserved under minimization, similar to the preservation of submodular property in Theorem 7. We refer to a theorem in [33]:

Theorem 28 (Preservation of L -Convexity)

Suppose S  X × Y  is a L -convex set, and \(f:S\rightarrow \Re \) is a L -convex function. Then the function \(g(y)=\min \{f(x,y)\mid (x,y)\in S\}\) is a L -convex function defined on L -convex set T = {y∣∃(x, y) ∈ S}

Proof

We first prove the L -convexity of T by definition. For any given y − te, y′− t′e ∈ T with t, t′Z +, by definition there exists x, x′∈ X such that (x, y − te), (x′, y′− t′e) ∈ S. Define z = x + te and z′ = x′ + te, then (z, y) − te = (x, y − te) and (z, , y′) − t′e = (x′, y′− t′e). By L -convexity of S,

$$\displaystyle \begin{aligned}(z\vee z'-(t\vee t') e, y\vee y'-(t\vee t') e)=(z,y)\vee (z',y')-(t\vee t') e\in S\end{aligned}$$

and

$$\displaystyle \begin{aligned}(z\wedge z'-(t\wedge t') e, y\wedge y'-(t\wedge t') e)=(z,y)\wedge (z',y')-(t\wedge t') e\in S.\end{aligned}$$

It follows that y ∨ y′− (t ∨ t′)e ∈ T and y ∧ y′− (t ∧ t′)e ∈ T.

Next, we establish the L -convexity of function g by constructive approach. Define h(x, y, t) = f((x, y) − te) and \(\hat h(y,t)=g(y-te)\). For any given y − te, y′− t′e ∈ T with t, t′Z +, by definition there exists x, x′∈ X such that f(x, y − te) = g(y − te) and f(x′, y′− t′e) = g(y′− t′e). Define z = x + te and z′ = x′ + te, then g(y − te) = f((z, y) − te) = h(z, y, t) and g(y′− t′e) = f((z′, y′) − t′e) = h(z′, y′, t′). By definition of L -convexity, the function h is submodular, therefore

$$\displaystyle \begin{aligned}h(z,y,t)+h(z',y',t')\ge h(z\vee z', y\vee y',t\vee t')+h(z\wedge z', y\wedge y',t\wedge t').\end{aligned}$$

It follows from (z, y) − te = (x, y − te) ∈ S and (z′, y′) − t′e = (x′, y′− t′e) ∈ S that \(\left (z\vee z'-(t\vee t')e, y\vee y'-(t\vee t')e\right )\in S\), therefore

$$\displaystyle \begin{aligned}h(z\vee z', y\vee y',t\vee t')=f(z\vee z'-(t\vee t')e, y\vee y'-(t\vee t')e)\ge g(y\vee y'-(t\vee t')e).\end{aligned}$$

Similarly,

$$\displaystyle \begin{aligned}h(z\wedge z', y\wedge y',t\wedge t')=f(z\wedge z'-(t\wedge t')e, y\wedge y'-(t\wedge t')e)\ge g(y\wedge y'-(t\wedge t')e).\end{aligned}$$

Combine these inequalities together, we obtain

$$\displaystyle \begin{aligned}g(y-te)+g(y'-t'e)\ge g(y\vee y'-(t\vee t')e)+g(y\wedge y'-(t\wedge t')e),\end{aligned}$$

which implies the L -convexity of function g.

Furthermore, for the dynamic inventory control problem we notice the following two facts:

  1. 1.

    The set \((x,y)\in \Re ^2\mid x-y\le c\}\) is L -convex.

  2. 2.

    For convex function \(f:\Re \rightarrow \Re \), f(x − y) is always L -convex.

Therefore, if C t+1 is L -convex, for each given D t, so is the function \(F_{D^t}(S^t)=\min \left \{\sum _{i=1}^nc_i(S_i^{t+1}-S_i^t-\sum _{i\ge s}D_i^t)+\sum _{i=1}^n f_i(S_i^{t+1}-S_{i+1}^{t+1})+V_{t+1}(I^{t+1}):S_i^{t+1}\right .\) \(\left .\ge S_i^t+\sum _{j\ge i}D_i^t\right \}\). By linearity in definition of L -convexity, we know that \(C_{t+1}(S^t)=E_{D^t}F_{D^t}(S^t)\) is also L -convex. So we can inductively establish the L -convex property for C t:

Theorem 29

The function C t is L -convex, and the original function V t is multimodular.

Definition 12 (Multimodular Function)

A function \(f:X\rightarrow \Re \) is defined on \(S\subseteq \Re ^N\), which is \(X=\{x: a_j^Tx\le b, j=1,2,\cdots , m\}\), where each a j is of form \(\sum _{i=K}^L e_i\), i.e., vector with value 1 on consecutive indices, and 0 if else, or \(\sum _{i=K}^L e_i\), for different 1 ≤ K ≤ L ≤ N. If Φ(x) = f(x 1 − y, x 2 − x 1, ⋯ , x N − x N−1) is submodular on \(S=\{(x,y)\in \Re ^{N+1}\mid (x_1-y,x_2-x_1, \cdots , x_N-x_{N-1})\in X\}\) is a submodular function, we say f is a multimodular function.

Multimodular function is essentially L -convex function under a linear transformation, which was established in [24]:

Theorem 30

Suppose we define set \(Z=\{z\in \Re ^N\mid (z_1,z_1-z_2, z_2-z_3,\cdots , z_n-z_{n-1})\in X\}=\{z\mid (z,0)\in S\}\) and function g(z) = f(z 1, z 1 − z 2, z 2 − z 3, ⋯ , z n − z n−1). Then \(f: X\rightarrow \Re \) is a multimodular function, is equivalent to \(g: Z\rightarrow \Re \) , and is a L -convex function.

Multimodularity, or equivalently under transformation, L -convexity are used to characterize the dynamic decision systems and the corresponding optimum solutions for inventory management of perishable goods [4] and [21], as well as the queueing system [1].

For optimizing the L -convex functions in the dynamic system, one could not simply apply the greedy local search algorithm, because the state space is exponentially large which makes it impossible to recursively solve for function values at all states as the classical dynamic programming approach does. Therefore, one can only hope to solve the problem approximately in dynamic system by adaptive approximation approach, which uses classes of simple functions to approximate each V t, and recursively find the best approximation for each state function V t. In particular, [22] uses linear functions for inventory problem of perishable goods. In Sun et al. [27], a class of quadratic functions has been used, according to the following Lemma they established based on Murota’s characterization of quadratic L functions:

Lemma 3

A quadratic function \(f:\Re ^N\rightarrow \Re \) is multimodular if and only it can be expressed as \(f(x)=\sum _{i=1}^N\sum _{j=1}^iQ^{ij}(\sum _{l=j}^i x_l)\) , where Q ij(x) = a ij x 2 + b ij + c ij with a ij ≥ 0.

Recently, Chen et al. [4] develop the basis function approach which approximates the original function by a linear combination \(\hat F(x)=\sum _{i=1}^N\sum _{j=1}^iB^{ij}(\sum _{l=i}^j x_l)\) of basis functions B ij, which can be recursively constructed by solving single dimensional optimization problems. This approach is much more flexible by allowing a much broader class of functions to be used to approximate the original function, and does achieve significant improvement in practice.

4.3 Online/Dynamic Matching

Submodularity also has important applications in dynamic matching. To begin with, we analyze the static matching. Consider the bipartite matching problem with two sets (A and B) of nodes, and edges (i, j) ∈ E ⊆ A × B. Each edge (i, j) ∈ E is associated with a weight w ij. The objective is to match the nodes to maximize the total matching weight, constraint to that each node can be matched to at most one node. This problem can be modeled by:

$$\displaystyle \begin{aligned}\begin{array}{rl} W(A,B)=\max&\sum_{i\in A, j\in B}w_{ij}x_{ij}\\ {\mathbf s.t.}&\sum_{j\in B}x_{ij}\le 1 \forall i\in A\\ &\sum_{i\in A}x_{ij}\le 1 \forall j\in B\\ &x_{ij}=0\mbox{ or } 1\forall i\in A, j\in B. \end{array} \end{aligned}$$

Since the constraint matrix is unimodal, there is no integrality gap, we can replace the integer constraints by x ij ≥ 0.

We can even consider a more general formulation:

$$\displaystyle \begin{aligned}\begin{array}{rl} U(a,b)=\max&\sum_{i\in A, j\in B}w_{ij}x_{ij}\\ {\mathbf s.t.}&\sum_{j\in B}x_{ij}\le a_i \forall i\in A\\ &\sum_{i\in A}x_{ij}\le b_j \forall j\in B\\ &x_{ij}\ge 0, \end{array} \end{aligned}$$

which satisfies W(A, B) = U(1 A, 1 B).

Intuitively, we can view a and b as resources, and resources in a (or b) are substitutes to each other, so adding more and more resources in a (or b) has diminishing return. However, resources in a are complementary to those in b, so adding resource in a would boost the values of resources in b, and vice versa. Next, we show the function U(a, b) is submodular within a, b, but supermodular across them:

Theorem 31

The function U(a, −b) is submodular! By setting a = 1 A and b = −1 B), we know that W(A, B) is submodular in A for fixed B.

Proof

Taking the dual, we have

$$\displaystyle \begin{aligned} &U(a,-b)\\ &=\min\left\{\sum_{i\in A}a_iy_i-\sum_{j\in B}b_jz_j\mid y\in \Re_+^A, z\in \Re_+^B, y_i+z_j\ge w_{ij}, \forall i\in A, j\in B\right\}. \end{aligned} $$

By defining variable v = −y, it becomes

Note that the objective function − a T v − b T z is submodular in (a, b, v, z), and the feasible domain is a lattice, by Theorem 7 the function U(a, −b) is submodular.

For online matching problems, submodularity or equivalently the diminishing return property plays a crucial role. We present a more general online matching case in [19].

Problem 11

There is a fixed group A of players, and a group of items arrive stochastically. The items are of different types j ∈ T, at each time t the type j t of the item arrives following an i.i.d distribution. At the end (time N), player i receives S i of items, with submodular utility V i(S i). We need to match each item at the time it arrives, and aim at maximizing the total matching score at the end. One thing to note that is, by setting \(V_i(S_i)=\max _{j\in S_i}w_{ij}\), we can reduce this problem to an online bipartite matching problem.

Consider the following greedy allocation rule:

Algorithm 7: Greedy online matching

Theorem 32

The greedy algorithm achieves \(1-\frac {1}{e}\) approximation ratio.

Proof

By submodularity of V i and two sets X and Y  of items, we have V i(X ∪ Y ) − V i(X) ≤∑jY c j(Y )[V i(X ∪{j}) − V i(X)]. Denote p j as the arrival probability of type j items, and c j(S) as the type j items used in set S, then the optimum offline matching value with expected number of arrival is bounded by:

$$\displaystyle \begin{aligned}\begin{array}{rl} {\mathcal OPT}\le \max&\sum_{i\in A,S}V_i(S)x_{i,S}\\ {\mathbf s.t.}&\sum_{i\in A}\sum_{S:j\in S}x_{i,S}c_j(S)\le p_jN, \forall j\in T\\ &\sum_{S}x_{i,S}\le 1, \forall i\in A\\ &x_{i,S}\ge 0, \forall i\in A \mbox{ and set } S. \end{array} \end{aligned}$$

Denote y ij =∑S:jS x i,S c j(S) and \(z_{ij}=\frac {y_{ij}}{ p_j N}\), then ∑iA z ij ≤ 1 and z ≥ 0. When item of type j arrives at time t, consider the random allocation which assigns this item to player i with probability z ij. Then the expected gain is

$$\displaystyle \begin{aligned}\sum_{i,j}p_jz_{ij}[V_i(S_i^{t-1}\cup \{j\})-V_i(S_i^{t-1})]=\frac{1}{N}\sum_{i,j}y_{ij}[V_i(S_i^{t-1}\cup \{j\})-V_i(S_i^{t-1})].\end{aligned}$$

The greedy has at least the expected gain, denote actual matching value at stage t as V t. Therefore,

$$\displaystyle \begin{aligned} E[V^t]-V^{t-1}&\ge \frac{1}{N}\sum_{i,S}x_{i,S}\left[\sum_{j\in S}c_j(S)[V_i(S_i^{t-1}\cup \{j\})-V_i(S_i^{t-1})]\right]\\ &\ge \frac{1}{N}\sum_{i,S}x_{i,S}[V_i(S_i^{t-1}\cup S)-V_i(S_i^{t-1})]. \end{aligned} $$

Therefore

$$\displaystyle \begin{aligned} E[V^t]-V^{t-1}&\ge \frac{1}{N}{\mathcal OPT}-\frac{1}{N}\sum_{i,S}x_{i,S}V_i\left(S_i^{t-1}\right)\\ &\ge \frac{1}{N}\left[{\mathcal OPT}-\sum_i V_i(S_i^{t-1})\right]=\frac{1}{N}\left[{\mathcal OPT}-V^{t-1}\right]. \end{aligned} $$

This approximation ratio is tight in online stochastic matching scenario, and similar algorithms and analyses have been established for other online matching and online allocation problems [10, 20, 32], etc.

The diminishing return effect from submodularity can be applied to quantify the matching efficiency, when we combine matching stages with other type of operations, in algorithm design. In a recent work, He et al. [7] studied the matching problem in kidney exchange, which matches donated kidneys from non-directed donors, as well as kidneys from relatives of patients who does not match with own targeted relative, to other patients. The matching process has been divided into two stages, in the first stage random walk mechanism has been applied to achieve efficient chains in difficult patients, and in the second stage bipartite matching algorithms are applied to further reduce number of unmatched patients. Submodularity of second stage matching score, with respect to the available (unmatched) difficult patients for matching at beginning of stage two, is utilized to transfer the analysis in stage one, to an analysis of the full mechanism. By this approach, the first non-asymptotic bound on matching efficiency has been established for medium size random graphs.