Abstract
A notion of generalized n-semimodularity is introduced, which extends that of (sub/super)modularity in four ways at once. The main result of this paper, stating that every generalized -semimodular function on the nth Cartesian power of a distributive lattice is generalized n-semimodular, may be considered a multi/infinite-dimensional analogue of the well-known Muirhead lemma in the theory of Schur majorization. This result is also similar to a discretized version of the well-known theorem due to Lorentz, which latter was given only for additive-type functions. Illustrations of our main result are presented for counts of combinations of faces of a polytope; one-sided potentials; multiadditive forms, including multilinear ones—in particular, permanents of rectangular matrices and elementary symmetric functions; and association inequalities for order statistics. Based on an extension of the FKG inequality due to Rinott & Saks and Aharoni & Keich, applications to correlation inequalities for order statistics are given as well.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Semimodularity
- Submodularity
- Supermodularity
- FKG-type inequalities
- Association inequalities
- Correlation inequalities
2010 Mathematics Subject Classification
8.1 Summary and Discussion
As pointed out e.g. in [3, 4], the notion of submodularity has become useful in various areas: combinatorial optimization, with many applications in operations research; machine learning; computer vision; electrical networks; signal processing; several areas of theoretical computer science, such as matroid theory; economics. One may also note the use of this notion in potential theory [6], as a capacity is a submodular function.
Let L be any distributive lattice; for definitions and facts pertaining to lattices, see e.g. [10].
A function \(\lambda \colon L\to \mathbb {R}\) is called submodular if
for all f and g in L. A function λ is called supermodular if the function − λ is submodular, and λ is called modular if it is both submodular and supermodular. See e.g. [4, 9, 18, 19, 24, 25]. Let us say that a function μ is log-submodular if \(\ln \mu \) is submodular. The log-submodularity condition and the corresponding log-supermodularity condition were referred to in Karlin and Rinott [13, 14] as the multivariate total positivity of order 2 (MTP2) and the multivariate reverse rule of order 2 (MRR2), respectively. As noted by Choquet [6, §14.3], a nondecreasing function λ is alternating of order 2 iff it satisfies inequality (8.1), that is, λ is submodular; it was also shown in [6] that the classical Newtonian capacity is such a function.
The log-supermodularity condition is the condition under which the famous Fortuin–Kasteleyn–Ginibre (FKG) correlation inequality [8] holds. Therefore, using inequality (8.17) together with the FKG inequality and its generalizations, we will be able to obtain the corresponding applications, in Corollaries 8.2.11 and 8.2.12.
More generally, let \(\mathcal {R}\) be any set, endowed with a transitive relation ⋊⋉ , so that for any a, b, c in \(\mathcal {R}\) one has the implication a⋊⋉ b & b⋊⋉ c ⇒ a⋊⋉ c. For any natural n, let us say that a function \(\Lambda \colon L^n\to \mathcal {R}\) is generalized n-semimodular if
for all f = (f 1, …, f n) ∈ L n, where f n:1, …, f n:n are the “order statistics” for f defined by the formula
for \(j\in [n]:=\overline {1,n}\), with \(\binom {[n]}j\) denoting the set of all subsets J of the set [n] such that the cardinality of J is j. Here and in the sequel we use the notation \(\overline {\alpha ,\beta }:=\{j\in \mathbb {Z}\colon \alpha \le j\le \beta \}\). In particular, f n:1 = f 1 ∧⋯ ∧ f n and f n:n = f 1 ∨⋯ ∨ f n.
For any function \(\lambda \colon L\to \mathbb {R}\), let the function \(\Lambda _\lambda \colon L^2\to \mathbb {R}\) be given by the formula Λλ(f, g) := λ(f) + λ(g) for f and g in L. Then, obviously, λ is submodular or supermodular or modular if and only if Λλ is generalized 2-semimodular with the relation “⋊⋉ ” being “≥” or “≤” or “= ”, respectively.
Thus, the notion of generalized n-semimodularity extends that of (sub/super)modularity in four ways at once: (1) the function Λ may be a function of any natural number n of arguments, whereas λ is a function of only one argument; (2) in contrast with a general form of dependence of Λ(f 1, …, f n) on f 1, …, f n, the function Λλ of two arguments is of the special form, linear in λ(f) and λ(g); (3) whereas the values of λ are real numbers, those of Λ may be in any set \(\mathcal {R}\); and (iv) we now have an arbitrary transitive relation ⋊⋉ over \(\mathcal {R}\) instead of one of the three particular relations “≥” or “≤” or “= ” over \(\mathbb {R}\).
For any k ∈ [n], let us say that a function \(\Lambda \colon L^n\to \mathbb {R}\) is generalized -semimodular if for each \(j\in \overline {0,{n-k}}\) and each (n − k)-tuple \((f_i\colon i\in [n]\,\setminus \,\overline {{j+1},{j+k}})\in L^{n-k}\) the function L k ∋ (f j+1, …, f j+k)↦ Λ(f 1, …, f n) is generalized k-semimodular. In particular, Λ is generalized -semimodular if and only if it is generalized n-semimodular.
Whenever the relation “⋊⋉ ” is denoted as “≥” or “≤” or “= ”, let us replace “semi” in the above definitions by “sub”, “super”, and “”, respectively. For instance, “generalized n-modular” will stand for “generalized n-semimodular” with the relation “⋊⋉ ” being “= ”.
The main result of this note is
Theorem 8.1.1
Again, let L be any distributive lattice. If a function \(\Lambda \colon L^n\to \mathcal {R}\) is generalized -semimodular, then it is generalized n-semimodular.
The necessary proofs will be given in Sect. 8.3.
As will be seen from the proof of Theorem 8.1.1, the condition that the function Λ be generalized -semimodular can be relaxed to the following: for each \(j\in \overline {1,{n-1}}\) and each f = (f 1, …, f n) ∈ L n such that f 1 ≤⋯ ≤ f j, one has L(f 1, …, f n)⋊⋉ L(f 1, …, f j−1, f j ∧ f j+1, f j ∨ f j+1, f j+2, …, f n).
Remark 8.1.2
Theorem 8.1.1 will not hold in general if the lattice L is not assumed to be distributive. For instance, let L be defined by the set [5] = {1, 2, 3, 4, 5} with the partial order being the subset of the natural order ≤ on the set [5] with elements 2, 3, 4 now considered non-comparable with one another, so that the resulting order relation is the set {(f, f): f ∈ [5]} ∪ {(1, 2), (1, 3), (1, 4), (2, 5), (3, 5), (4, 5), (1, 5)}; then, in particular, 2 ∧ 3 = 1 and 2 ∨ 3 = 5. This lattice is one of the simplest examples of non-distributive lattices. It is isomorphic to the diamond lattice M 3—see e.g. [10, p. 110]. Let n = 3, \(\mathcal {R}=\mathbb {R}\), and define the function \(\Lambda \colon L^3\to \mathbb {R}\) by the formula Λ(f 1, f 2, f 3) := 12f 1f 2 + 3f 2f 3 + 5f 1f 3 for all f = (f 1, f 2, f 3) ∈ L 3. Then one can verify directly—by a straightforward but tedious calculation consisting in checking 2 × 53 = 250 inequalities, two inequalities for each f = (f 1, f 2, f 3) ∈ [5]3— that this function Λ is generalized -submodular. However, Λ is not generalized 3- submodular, because for f = (2, 3, 4) one has (f 3:1, f 3:2, f 3:3) = (1, 5, 5) and Λ(f 1, f 2, f 3) = Λ(2, 3, 4) = 148≱160 = Λ(1, 5, 5) = Λ(f 3:1, f 3:2, f 3:3). □
Remark 8.1.3
A well-known fact, which will be crucial in the proof of Theorem 8.1.1, is the representation theorem due to Birkhoff and Stone stating that any distributive lattice L is isomorphic to a lattice of subsets of (and hence to a lattice of nonnegative real-valued functions on) a certain set S, depending on L (see e.g. [10, Theorem 119]). For such a lattice of functions, the “order statistics” f n:1, …, f n:n are uniquely determined by the condition that
for each s ∈ S, where the double braces are used to denote multisets, with appropriate multiplicities. To quickly see why this is true, one may reason as follows: Let us now use condition (8.3) to define f n:1, …, f n:n. Note that the value of the right-hand side (rhs) of (8.2) at any point s ∈ S is invariant with respect to all permutations of the values f 1(s), …, f n(s). So, the value of the rhs of (8.2) at s will not change if one replaces there f 1, …, f n by f n:1, …, f n:n, and this value will equal f n:j(s). Thus, the definition of f n:1, …, f n:n by means of formula (8.3) is equivalent to the one given by (8.2), if the lattice L is already a lattice of real-valued functions on S. Moreover, it is clear now that, if the lattice L is distributive, then definition (8.2) can be rewritten in the dual form, as
for all j ∈ [n].
On the other hand, it can be seen that, if L is not distributive, then this duality can be lost and each of the definitions (8.2) and (8.4) of f n:j can be rather unnaturally skewed up or down. For instance, in the counterexample given in Remark 8.1.2, for f = (2, 3, 4) we had (f 3:1, f 3:2, f 3:3) = (1, 5, 5) according to definition (8.2), but we would have (f 3:1, f 3:2, f 3:3) = (1, 1, 5) according to (8.4).
However, one may note that the right-hand side of (8.4) is always ≤ than that of (8.2); this follows because for any \(J\in \binom {[n]}{n+1-j}\) and any \(K\in \binom {[n]}j\) there is some k ∈ J ∩ K, and then ∧i ∈ Jf i ≤ f k ≤∨i ∈ Kf i. □
In view of the lattice representation theorem cited in Remark 8.1.3, Theorem 8.1.1 may be considered a multi/infinite-dimensional analogue of the well-known Muirhead lemma in the theory of Schur majorization (cf. e.g. [17, Lemma 2.B.1, p. 32]), which may be stated as follows: for vectors x and y in \(\mathbb {R}^n\) such that x ≺ y (that is, x is majorized by y), there exist finitely many vectors x 0, …, x m in \(\mathbb {R}^n\) such that x = x 0 ≺⋯ ≺ x m = y and for each \(j\in \overline {0,m-1}\) the vectors x j and x j+1 differ only in two coordinates. However, no direct multi-dimensional extension of the Muirhead lemma seems to exist, even in two dimensions (see e.g. [20, p. 11]).
For functions that are “infinite-dimensional” counterparts of the “m-dimensional” function \(\Lambda \colon L^m\to \mathbb {R}\) given by the formula of the additive form
Lorentz [16] obtained a result similar to Theorem 8.1.1; for readers’ convenience, let us reproduce it here: For each j ∈ [n], let \(f_j^*\) denote the equimeasurable decreasing rearrangement [11] of a function \(f_j\colon (0,1)\to \mathbb {R}\). Let a real-valued expression Φ(x, u 1, …, u n) be continuous in (x, u 1, …, u n) ∈ (0, 1) × [0, ∞) ×⋯ × [0, ∞). Then the inequality
holds for all bounded positive measurable functions f 1, …, f n from (0, 1) to \(\mathbb {R}\) if and only if the following two conditions hold:
and
for all h > 0, x ∈ (0, 1), δ ∈ (0, x ∧ (1 − x)), (u 1, …, u n) ∈ [0, ∞)n, and i, j in [n] such that i < j; here, in each of inequalities (8.7) and (8.8), the arguments of Φ that are the same for all the four instances of Φ are omitted, for brevity.
To establish the connection between Lorentz’s result and our Theorem 8.1.1, suppose e.g. that each of the functions f 1, …, f n in [16] is a step function, constant on each of the intervals \((\frac {j-1}m,\frac jm]\) for j ∈ [m], and then let \(g_j(s):=f_s(\frac jm)\) for j ∈ [m] and s ∈ S := [n]. In fact, in the proof in [16] the result is first established for such step functions f 1, …, f n. It is also shown in [16] that, for such “infinite-dimensional” counterparts of the functions given by the “additive” formula (8.5), the sufficient condition is also necessary. In turn, as pointed out in [16], the result there generalizes an inequality in [23]. Another proof of a special case of the result in [16] was given in [5].
8.2 Illustrations and Applications
8.2.1 A General Construction of Generalized n-Submodular Functions from Submodular Ones
Recall here some basics of majorization theory [17]. For x = (x 1, …, x n) and y = (y 1, …, y n) in \(\mathbb {R}^n\), write x ≺ y if x 1 + ⋯ + x n = y 1 + ⋯ + y n and for all k ∈ [n]. For any \(D\subseteq \mathbb {R}^n\), a function \(F\colon D\to \mathbb {R}\) is called Schur-concave if for any x and y in D such that x ≺ y one has F(x) ≥ F(y). If D = I n for some open interval \(I\subseteq \mathbb {R}\) and the function F is continuously differentiable then, by Schur’s theorem [17, Theorem A.4], F is Schur-concave iff \((\frac {\partial F}{\partial x_i}-\frac {\partial F}{\partial x_j})(x_i-x_j)\le 0\) for all x = (x 1, …, x n) ∈ D.
Proposition 8.2.1
Suppose that a real-valued function λ defined on a distributive lattice L is submodular and nondecreasing, and a function \(\mathbb {R}^n\ni x=(x_1,\dots ,x_n)\to F(x_1,\dots ,x_n)\) is nondecreasing in each of its n arguments and Schur-concave. Then the function \(\Lambda =\Lambda _{\lambda ,F}\colon L^n\to \mathbb {R}\) defined by the formula
for (f 1, …, f n) ∈ L n is generalized -submodular and hence generalized n-submodular.
A rather general construction of submodular functions on rings of sets is provided by [6, §23.2], which implies that ∪-homomorpisms preserve the property of being alternating of a given order, and the proposition at the end of [6, §23.1], which describes general ∪-homomorpisms as maps of the form
where S and T are sets and G ⊆ S × T; in the case when G is (the graph of) a map, the above notation G(A) is of course consistent with that for the image of a set A under the map G; according to the definition in the beginning of [6, §23], a ∪-homomorpism is a map φ of set rings defined by the condition φ(A ∪ B) = φ(A) ∪ φ(B) for all relevant sets A and B.
Therefore and because an additive function on a ring of sets is modular and hence submodular, we conclude that functions of the form
are submodular, where μ is a measure or, more generally, an additive function (say on a discrete set, to avoid matters of measurability). From this observation, one can immediately obtain any number of corollaries of Proposition 8.2.1 such as the following:
Corollary 8.2.2
Let P be a polytope of dimension d. For each \(\alpha \in \overline {0,d}\), let \(\mathcal {F}_\alpha \) denote the set of all α-faces (that is, faces of dimension α) of P. For any distinct α, β, γ in \(\overline {0,d}\), let G = G α,β,γ be the set of all pairs \(\big (f_\alpha ,(f_\beta ,f_\gamma )\big )\in \mathcal {F}_\alpha \times (\mathcal {F}_\beta \times \mathcal {F}_\gamma )\) such that f α ∩ f β≠∅, f α ∩ f γ≠∅, and f β ∩ f γ≠∅. Let L be a lattice of subsets of \(\mathcal {F}_\alpha \). Let a function \(\mathbb {R}^n\ni x=(x_1,\dots ,x_n)\to F(x_1,\dots ,x_n)\) be nondecreasing in each of its n arguments and Schur-concave. Then the function \(\Lambda =\Lambda _{\alpha ,\beta ,\gamma }\colon L^n\to \mathbb {R}\) defined by the formula
for (A 1, …, A n) ∈ L n is generalized -submodular and hence generalized n-submodular.
For readers’ convenience, here is a direct verification of the fact that maps of the form (8.10) are submodular: noting that G(A ∪ B) = G(A) ∪ G(B) and G(A ∩ B) ⊆ G(A) ∩ G(B) and using the additivity of μ, we have
for all relevant sets A and B.
8.2.2 Generalized One-Sided Potential
Let here L be the lattice of all measurable real-valued functions on a measure space (S, Σ, μ), with the pointwise lattice operations ∨ and ∧. Consider the function \(\Lambda \colon L^n\to \mathcal {R}\) given by the formula
for all f = (f 1, …, f n) ∈ L n, where
for all g ∈ L, \(\varphi \colon \mathbb {R}\to [0,\infty ]\) is a nondecreasing or nonincreasing function, and ψ: [0, ∞] → (−∞, ∞] is a concave function. Thus, the function Λ = Λφ,ψ may be referred to as a generalized one-sided potential, since the function φ is assumed to be monotonic.
Proposition 8.2.3
The function Λ = Λ φ,ψ defined by formula (8.11) is generalized -submodular and hence generalized n-submodular.
8.2.3 Symmetric Sums of Nonnegative Multiadditive Functions
Let k be a natural number. Let L be a sublattice of the lattice \(\mathbb {R}^S\) of all real-valued functions on a set S. Let us say that the lattice L is complementable if f ∖ g := f − f ∧ g ∈ L for any f and g in L, so that f = f ∧ g + f ∖ g. Assuming that L is complementable, let us say that a function \(m\colon L\to \mathbb {R}\) is additive if
for all f and g in L; further, let us say that a function \(m\colon L^k\to \mathbb {R}\) is multiadditive or, more specifically, k-additive if m is additive in each of its k arguments, that is, if for each j ∈ [k] and each (k − 1)-tuple (f i: i ∈ [k] ∖{j}) the function L ∋ f j↦m(f 1, …, f k) is additive.
To state the main result of this subsection, we shall need the following notation: for any set J, let \(\Pi _k^J\) denote the set of all k-permutations of J, that is, the set of all injective maps of the set [k] to J.
Proposition 8.2.4
Suppose that k and n are natural numbers such that k ≤ n, L is a complementable sublattice of \(\mathbb {R}^S\), and \(m\colon L^k\to \mathbb {R}\) is a nonnegative multiadditive function. Then the function \(\Lambda _m\colon L^n\to \mathbb {R}\) defined by the formula
for (f 1, …, f n) ∈ L n is generalized -submodular and hence generalized n-submodular.
Formula (8.13) can be rewritten in the following symmetrized form:
where, for I = {i 1, …, i k} with 1 ≤ i 1 < ⋯ < i k ≤ n,
note that the so-defined function \(\overline {m}\colon L^k\to \mathbb {R}\) is multiadditive and nonnegative, given that m is so. Also, \(\overline {m}\) is permutation-symmetric in the sense that \(\overline {m}(f_{\pi (1)},\dots ,f_{\pi (k)})=\overline {m}(f_1,\dots ,f_k)\) for all (f 1, …, f k) ∈ L k and all permutations \(\pi \in \Pi _k^{[k]}\).
Example 8.2.5
If V is a vector sublattice of the lattice \(\mathbb {R}^S\) and L is the lattice of all nonnegative functions in V then, clearly, L is complementable and the restriction to L k of any multilinear function from V k to \(\mathbb {R}\) is multiadditive.
In particular, if μ is a measure on a σ-algebra Σ over S, V is a vector sublattice of L k(S, Σ, μ), and L is the lattice of all nonnegative functions in V , then the function \(m\colon L^k\to \mathbb {R}\) given by the formula
for (f 1, …, f k) ∈ L k is multiadditive.
So, by Proposition 8.2.4, the functions Λm corresponding to the functions m presented above in this example are generalized -submodular and hence generalized n-submodular.
Let now B = (b i,j) be a d × p matrix with d ≤ p and nonnegative entries b i,j. The permanent of B is defined by the formula
where B ⋅J the square submatrix of B consisting of the columns of B with column indices in the set \(J\in \binom {[p]}d\); and for a square d × d matrix B = (b i,j),
So, \(\operatorname {perm} B\) is a multilinear function of the d-tuple (b 1,⋅, …, b d,⋅) of the rows of B. Also, if d = p, then \(\operatorname {perm} B\) is a multilinear function of the d-tuple (b ⋅,1, …, b ⋅,d) of the columns of B. If d ≥ p, then \(\operatorname {perm} B\) may be defined by the requirement that the permanent be invariant with respect to transposition.
Thus, from Proposition 8.2.4 we immediately obtain
Corollary 8.2.6
Assuming that the entries b i,j of the d × p matrix B are nonnegative, \(\operatorname {perm} B\) is a generalized d-submodular function of the d-tuple (b 1,⋅, …, b d,⋅) of its rows and a generalized p-submodular function of the p-tuple (b ⋅,1, …, b ⋅,p) of its columns (with respect to the standard lattice structures on \(\mathbb {R}^{1\times p}\) and \(\mathbb {R}^{d\times 1}\), respectively):
Note that the condition d ≤ p is not needed or assumed in Corollary 8.2.6.
Yet another way in which multilinear and hence multiadditive functions may arise is via the elementary symmetric polynomials. Let n be any natural number, and let k ∈ [n]. The elementary symmetric polynomials are defined by the formula
In particular, e 1(x 1, …, x n) :=∑j ∈ [n]x j and e n(x 1, …, x n) :=∏j ∈ [n]x j.
Let f = (f 1, …, f n) be the vector of measurable functions f 1, …, f n defined on a measure space (S, Σ, μ) with values in the interval [0, ∞). Then it is not hard to see that the “order statistics” are nonnegative measurable functions as well. As usual, let \(\mu (h):=\int _S h\,\operatorname {d}\mu \).
If the measure μ is a probability measure, then the functions f 1, …, f n are called random variables (r.v.’s) and, in this case, f n:1, …, f n:n will indeed be what is commonly referred to as the order statistics based on the “random sample” f = (f 1, …, f n); cf. e.g. [7]. In contrast with settings common in statistics, in general we do not impose any conditions on the joint or individual distributions of the r.v.’s f 1, …, f n—except that these r.v.’s be nonnegative.
Then we have the following.
Corollary 8.2.7
In particular,
This follows immediately from Proposition 8.2.4 and formula (8.14), since the product μ(f 1)⋯μ(f k) is clearly multilinear and hence multiadditive in (f 1, …, f k).
To deal with cases when some of the μ(f j)’s (or the μ(f n:j)’s) equal 0 and other ones equal ∞, let us assume here the convention 0 ⋅∞ := 0. One may note that, if the nonnegative functions f 1, …, f n are scalar multiples of one another or, more generally, if f π(1) ≤⋯ ≤ f π(n) for some permutation π of the set [n], then inequality (8.16) turns into the equality.
As mentioned above, in Corollary 8.2.7 it is not assumed that f 1, …, f n are independent r.v.’s. However, if μ is a probability measure and the r.v.’s f 1, …, f n are independent (but not necessarily identically distributed), then μ(f 1)⋯μ(f n) = μ(f 1⋯f n) = μ(f n:1⋯f n:n) by the second part of (8.3), and so, (8.17) can then be rewritten as the following positive-association-type inequality for the order statistics:
Let now ψ be any monotone (that is, either nondecreasing or nonincreasing) function from [0, ∞] to [0, ∞]. For f = (f 1, …, f n) as before, let
Then for j ∈ [n] one has (ψ•f)n:j = ψ ∘ f n:j if ψ is nondecreasing and (ψ•f)n:j = ψ ∘ f n:n+1−j if ψ is nonincreasing. Thus, we have the following ostensibly more general forms of (8.17) and (8.18):
Corollary 8.2.8
If μ is a probability measure and the r.v.’s f 1, …, f n are independent, then
The property of the order statistics f n:1, ⋯ , f n:n given by inequality (8.20) may be called the diagonal positive orthant dependence—cf. e.g. Definition 2.3 in [12] of the negative orthant dependence.
Immediately from Theorem 8.1.1 or from inequality (8.19) in Corollary 8.2.8, one obtains
Corollary 8.2.9
Take any \(p\in \mathbb {R}\setminus \{0\}\) . Then
for any r ∈ (0, ∞), and
for any r ∈ (−∞, 0). Here we use the conventions 0t := ∞ and ∞ t := 0 for t ∈ (−∞, 0). We also the following conventions: 0 ⋅∞ := 0 concerning (8.21) and 0 ⋅∞ := ∞ concerning (8.22).
Consider now the special case of Corollary 8.2.9 with r = 1∕p. Letting then p →∞, we see that (8.21) will hold with the \(\mu (f_j^p)^r\)’s and \(\mu (f_{n:j}^p)^r\)’s replaced there by and , respectively, where denotes the essential supremum with respect to measure μ. This follows because . Similarly, letting p →−∞, we see that (8.22) will hold with the \(\mu (f_j^p)^r\)’s and \(\mu (f_{n:j}^p)^r\)’s replaced there by and , respectively, where denotes the essential infimum with respect to μ. Moreover, considering (say) the counting measures μ on finite subsets of the set S and noting that \(\sup h=\sup _S h\) coincides with the limit of the net (maxJh) over the filter of all finite subsets J of S, we conclude that (8.21) will hold with the \(\mu (f_j^p)^r\)’s and \(\mu (f_{n:j}^p)^r\)’s replaced there by \(\sup f_j\) and \(\sup f_{n:j}\), respectively. (The statement about the limit can be spelled out as follows: supSh ≥maxJh for all finite J ⊆ S, and for each real c such that \(c<\sup h\) there is some finite set J c ⊆ S such that for all finite sets J such that J c ⊆ J ⊆ S one has maxJh > c.) Similarly, (8.22) will hold with the \(\mu (f_j^p)^r\)’s and \(\mu (f_{n:j}^p)^r\)’s replaced there by \(\inf f_j\) and \(\inf f_{n:j}\), respectively. Thus, we have
Corollary 8.2.10
and
Here we use the following conventions: 0 ⋅∞ := 0 concerning (8.23) and 0 ⋅∞ := ∞ concerning (8.24).
Alternatively, one can obtain (8.23) and (8.24) directly from Theorem 8.1.1.
Also, of course there is no need to assume in Corollary 8.2.10 that the functions f 1, …, f n are measurable.
The special cases of inequalities (8.22) and (8.24) for n = 2 mean that the functions h↦μ(h p)r and \(h\mapsto \inf h\) are log-supermodular functions on the distributive lattice (say \(\mathcal {L}_{\Sigma }\)) of all nonnegative Σ-measurable functions on S and on the distributive lattice (say \(\mathcal {L}\)) of all nonnegative functions on S, respectively.
At this point, let us recall the famous Fortuin–Kasteleyn–Ginibre (FKG) correlation inequality [8], which states that for any log-supermodular function ν on a finite distributive lattice L and any nondecreasing functions F and G on L we have
where ν(F) :=∑f ∈ Lν(f).
Then we immediately obtain
Corollary 8.2.11
Let \(\mathcal {L}^\circ _{\Sigma }\) be any finite sub-lattice of the lattice \(\mathcal {L}_{\Sigma }\) , and let F and G be nondecreasing functions from \(\mathcal {L}^\circ _{\Sigma }\) to \(\mathbb {R}\) . Then
for any r ∈ (−∞, 0). Similarly, let \(\mathcal {L}^\circ \) be any finite sub-lattice of the lattice \(\mathcal {L}\), and let F and G be nondecreasing functions from \(\mathcal {L}^\circ \) to \(\mathbb {R}\). Then
As shown by Ahlswede and Daykin [2, pp. 288–289], their inequality [2, Theorem 1] almost immediately implies, and is in a sense sharper than, the FKG inequality. Furthermore, Rinott and Saks [21, 22] and Aharoni and Keich [1] independently obtained a more general inequality “for n-tuples of nonnegative functions on a distributive lattice, of which the Ahlswede–Daykin inequality is the case n = 2.” More specifically, in notation closer to that used in the present paper, [1, Theorem 1.1] states the following:
Let α 1, …, α n, β 1, …, β n be nonnegative functions defined on a distributive lattice L such that
for all f 1, …, f n in L. Then for any finite subsets F 1, …, F n of L
where
Note that the definition of the “order statistics” used in [1] is different from (8.2) in that their “order statistics” go in the descending, rather than ascending, order; also, the term “order statistics” is not used in [1].
In view of this result of [1] and our Corollaries 8.2.9 and 8.2.10, one immediately obtains the following statement, which generalizes and strengthens Corollary 8.2.11:
Corollary 8.2.12
Let \(\mathcal {F}_1,\dots ,\mathcal {F}_n\) be any finite subsets of the lattice \(\mathcal {L}_\Sigma \). For each j ∈ [n], let
Then
for any r ∈ (−∞, 0).
Similarly, let now \(\mathcal {F}_1,\dots ,\mathcal {F}_n\) be any finite subsets of the lattice \(\mathcal {L}\) . Then
Comparing inequalities (8.21) and (8.22) in Corollary 8.2.9 or inequalities (8.23) and (8.24) in Corollary 8.2.10, one may wonder whether the FKG-type inequalities stated in Corollaries 8.2.11 and 8.2.12 for the functions h↦μ(h)r with r < 0 and \(h\mapsto \inf h\) admit of the corresponding reverse analogues for the functions h↦μ(h)r with r > 0 and \(h\mapsto \sup h\). However, it is not hard to see that such FKG-type inequalities are not reversible in this sense, a reason being that the sets \(\mathcal {F}_{n:j}\) may be much larger than the sets \(\mathcal {F}_j\).
E.g., suppose that n = 2, \(S=\mathbb {R}\), μ is a Borel probability measure on \(\mathbb {R}\), 0 < ε < δ < 1, N is a natural number, \(\mathcal {F}_1\) is the set of N pairwise distinct constant functions f 1, …, f N on \(\mathbb {R}\) such that 1 − ε < f j < 1 + ε for all j ∈ [n], and \(\mathcal {F}_2=\{g_1,\dots ,g_N\}\), where g j := (1 − δ)1 (−∞,j] + (1 + δ)1 (j,∞) and 1 A denotes the indicator of a set A. Then it is easy to see that each of the sets \(\mathcal {F}_{2:1}\) and \(\mathcal {F}_{2:2}\) is of cardinality N 2. So, letting δ ↓ 0 (so that ε ↓ 0 as well), we see that, for any real r, the right-hand side of (8.25) goes to N 4 whereas its left-hand side goes to N 2, which is much less than N 4 if N is large.
Example 8.2.13
Closely related to Example 8.2.5 is as follows. Suppose that (S, Σ) is a measurable space, μ is a measure on the product σ-algebra Σ⊗k, and L is a subring of Σ. Then L is complementable and the function \(m\colon L^k\to \mathbb {R}\) given by the formula
for (A 1, …, A k) ∈ L k is multiadditive.
A particular case of formula (8.26) is
where \(\operatorname {card}\) stands for the cardinality and G is an arbitrary subset of S k. If G is symmetric in the sense that (s 1, …, s k) ∈ G iff (s π(1), …, s π(k)) ∈ G for all permutations π of the set [k], then G represents the set (say E) of all hyperedges of a k-uniform hypergraph over S, in the sense that (s 1, …, s k) ∈ G iff {s 1, …, s k}∈ E.
We now have another immediate corollary of Proposition 8.2.4:
Corollary 8.2.14
Suppose that k and n are natural numbers such that k ≤ n, (S, Σ) is a measurable space, μ is a measure on the product σ-algebra Σ ⊗k, and L is a subring of Σ. Then
for all (A 1, …, A n) ∈ L n.
8.3 Proofs
One may note that formula (8.31) in the proof of Theorem 8.1.1 below defines a step similar to a step in the process of the so-called insertion search (cf. e.g. [15, Section 5.2.1] (also called the sifting or sinking technique)—except that here we do the pointwise comparison of functions (rather than numbers) and therefore we do not stop when the right place of the value f n+1(s) of the “new” function f n+1 among the already ordered values f n:1(s), …, f n:n(s) at a particular point s ∈ S has been found, because this place will in general depend on s. So, the proof that (8.31) implies (8.34) may be considered as (something a bit more than) a rigorous proof of the validity of the insertion search algorithm, avoiding such informal, undefined terms as swap, moving, and interleaving.
Proof of Theorem 8.1.1
Let us prove the theorem by induction in n. For n = 1, the result is trivial. To make the induction step, it suffices to prove the following: For any natural n ≥ 2, if the function \(\Lambda \colon L^n\to \mathcal {R}\) is generalized -semimodular and the function L n−1 ∋ (f 1, …, f n−1)↦ Λ(f 1, …, f n) is generalized (n − 1)-semimodular for each f n ∈ L, then Λ is generalized n-semimodular. Thus, we are assuming that the function \(\Lambda \colon L^n\to \mathcal {R}\) is generalized -semimodular and
for all (f 1, …, f n) ∈ L n, where f n−1:1, …, f n−1:n−1 are the “order statistics” based on (f 1, …, f n−1).
Take indeed any (f 1, …, f n) ∈ L n. Define the rectangular array of functions \((g_{k,j}\colon k\in \overline {0,{n-1}}, j\in [n])\) recursively, as follows:
and, for \(k\in \overline {1,{n-1}}\) and j ∈ [n],
Moreover, for each \(k\in \overline {1,{n-1}}\),
since Λ is generalized -semimodular.
It follows from (8.32) and (8.33) that
It remains to verify the identity
In accordance with Remark 8.1.3, we may and shall assume that the distributive lattice L is a lattice of nonnegative real-valued functions on a set S, so that (8.3) holds for each s ∈ S.
In the remainder of the proof, fix any s ∈ S. Then
by (8.30) and the second part of (8.3) used with n − 1 in place of n; also, for each \(k\in \overline {1,{n-1}}\),
by (8.31). So,
Therefore, to complete the proof of (8.34) and thus that of Theorem 8.1.1, it remains to show that
which will follow immediately from
Lemma 8.3.1
For each \(k\in \overline {1,{n-1}}\), the following assertion is true for all s ∈ S:
Indeed, (8.35) is the first clause in assertion (A k) with k = n − 1. Thus, what finally remains to prove Theorem 8.1.1 is to present the following.
For simplicity, let us be dropping (s)—thus writing g k,j, f n, … in place of g k,j(s), f n(s), …. We shall prove Lemma 8.3.1 by induction in \(k\in \overline {1,{n-1}}\). Assertion (A 1) means that g 1,1 ≤⋯ ≤ g 1,n−2, g 1,n−1 ≤ g 1,n, and g 1,n−2 ≤ g 1,n if 1 ≤ n − 2. So, in view of (8.31) and (8.30), (A 1) can be rewritten as follows: f n−1:1 ≤⋯ ≤ f n−1:n−2, f n−1:n−1 ∧ f n ≤ f n−1:n−1 ∨ f n, and f n−1:n−2 ≤ f n−1:n−1 ∨ f n; all these inequalities are obvious. So, (A 1) holds.
Take now any \(k\in \overline {2,{n-1}}\) and suppose that (A k−1) holds. We need to show that then (A k) holds.
For all \(j\in \overline {1,{n-k-2}}\,\cup \,\overline {{n-k+2},{n-1}}\), we have \(j+1\in \overline {1,{n-k-1}} \cup \,\overline {{n-k+2},n}\), whence, by (8.31) and the first clause of (A k−1), g k,j = g k−1,j ≤ g k−1,j+1 = g k,j+1. So,
If j = n − k then, by (8.31), g k,j = g k−1,n−k ∧ g k−1,n−k+1 ≤ g k−1,n−k ∨ g k−1,n−k+1 = g k,j+1.
If j = n − k + 1 then the condition \(k\in \overline {2,{n-1}}\) implies j ≤ n − 1, and so, by (8.31) and the second and first clauses of (A k−1), g k,j = g k−1,n−k ∨ g k−1,n−k+1 ≤ g k−1,n−k+2 = g k−1,j+1 = g k,j+1.
Thus, in view of (8.36), the first clause of (A k) holds. Also, if k ≤ n − 2 then, by (8.31) and the first clause of (A k−1), g k,n−k−1 = g k−1,n−k−1 ≤ g k−1,n−k ≤ g k−1,n−k ∨ g k−1,n−k+1 = g k,n−k+1, so that the second clause of (A k) holds as well. This completes the proof of Lemma 8.3.1.
Thus, Theorem 8.1.1 is proved. □
Proof of Proposition 8.2.1
Take any (f 1, …, f n) ∈ L n. Corollary B.3 in [17] states that x ≺ y iff x is in the convex hull of the set of all points obtained by permuting the coordinates of the vector y. Also, since the function λ is nondecreasing, we have λ(f 1 ∨ f 2) ≥ λ(f 1) ∨ λ(f 2). For any real a, b, c such that c ≥ a ∨ b, we have (a, b) = (1 − t)(a + b − c, c) + t(c, a + b − c) for \(t=\frac {c-b}{2c-a-b}\in [0,1]\) if c > (a + b)∕2 and for any t ∈ [0, 1] otherwise (that is, if a = b = c). So, the point (a, b) is a convex combination of points (a + b − c, c) and (c, a + b − c). Using this fact for a = λ(f 1), b = λ(f 2), c = λ(f 1 ∨ f 2), we see that
Also, λ(f 1 ∧ f 2) ≤ λ(f 1) + λ(f 2) − λ(f 1 ∨ f 2), by the submodularity of λ. Therefore and because F is nondecreasing (in each of its n arguments) and Schur-concave, we conclude that
Quite similarly,
for all \(i\in \overline {1,{n-1}}\), so that the function F is indeed generalized -submodular and hence, by Theorem 8.1.1, generalized n-submodular. □
Proof of Proposition 8.2.3
In view of Theorem 8.1.1, it is enough to show that the function Λ = Λφ,ψ is generalized -submodular. Without loss of generality (w.l.o.g.), we may and shall assume that the function φ is nondecreasing, since \(\Lambda _{\varphi ^-,\psi }=\Lambda _{\varphi ,\psi }\), where φ −(u) := φ(−u) for all real u. Also, w.l.o.g. ψ(0) = 0 and hence Ψ(0) = 0.
Take any f = (f 1, …, f n) ∈ L n. Then, letting
for g ∈ L, one has
where g j := f 1 − f j, h j := f 2 − f j, and \(R:=\sum _{3\le j<k\le n}^n\tilde \Psi (f_j-f_k)\). Since f 1 ∧ f 2 − f 1 ∨ f 2 = −|f 1 − f 2|, one similarly has
Next,
φ ∘ (f 1 − f 2) + φ ∘ (f 2 − f 1) = φ ∘|f 1 − f 2| + φ ∘ (−|f 1 − f 2|) and hence
Also, since φ is nondecreasing, φ ∘ (f 1 − f 2) ∨ φ ∘ (f 2 − f 1) ≤ φ ∘|f 1 − f 2| and hence
Since the function ψ is convex, it follows from (8.40)–(8.43) that
Further, take any \(j\in \overline {3,n}\). Then φ ∘ g j + φ ∘ h j = φ ∘ (g j ∧ h j) + φ ∘ (g j ∨ h j). So,
Moreover, since φ is nondecreasing, \(\int _S\varphi \circ (g_j\vee h_j)\operatorname {d}\mu \) is no less than each of the integrals \(\int _S(\varphi \circ g_j)\operatorname {d}\mu \) and \(\int _S(\varphi \circ h_j)\operatorname {d}\mu \). So, in view of (8.12) and the convexity of the function ψ, one has Ψ(g j) + Ψ(h j) ≤ Ψ(g j ∧ h j) + Ψ(g j ∨ h j). Similarly, because \(\int _S\varphi \circ (-(g_j\wedge h_j))\operatorname {d}\mu \) is no less than each of the integrals \(\int _S\varphi \circ (-g_j)\operatorname {d}\mu \) and \(\int _S\varphi \circ (-h_j)\operatorname {d}\mu \), one has Ψ(−g j) + Ψ(−h j) ≤ Ψ(−(g j ∧ h j)) + Ψ(−(g j ∨ h j)). So, by (8.37), \(\tilde \Psi (g_j)+\tilde \Psi (h_j)\le \tilde \Psi (g_j\wedge h_j)+\tilde \Psi (g_j\vee h_j)\).
Therefore, by (8.38), (8.39), and (8.44), Λ(f 1, f 2, f 3, …, f n) ≤ Λ(f 1 ∧ f 2, f 1 ∨ f 2, f 3, …, f n). Similarly, Λ(f 1, …, f j−1, f j, f j+1, f j+2, …, f n) ≤ Λ(f 1, …, f j−1, f j ∧f j+1, f j ∨f j+1, f j+2, …, f n)for all \(j\in \overline {1,{n-1}}\).
Thus, the function Λ is generalized -supermodular, and so, by Theorem 8.1.1, it is generalized n-supermodular. □
Proof of Proposition 8.2.4
Fix any (f 1, …, f n) ∈ L n. Then, in view of the permutation symmetry of \(\overline {m}\) defined by (8.15),
where
Similarly,
Note that the function \(\lambda _2\colon L^2\to \mathbb {R}\) is 2-additive and permutation-symmetric, and the function \(\lambda _1\colon L^2\to \mathbb {R}\) is additive. Take any f and g in L. Then (f ∨ g) ∧ f = f and (f ∨ g) ∖ f = g ∖ f. So, by the additivity of λ 1 we have λ 1(f ∨ g) = λ 1(f) + λ 1(g ∖ f), whereas λ 1(f ∧ g) + λ 1(g ∖ f) = λ 1(g). So,
By the 2-additivity and permutation symmetry of λ 2 and because the function λ 2 is 2-additive, permutation-symmetric, and nonnegative, we have
It follows from (8.45), (8.46), (8.47), and (8.48) (with f = f n−1 and g = f n) that
Therefore, being permutation-symmetric, the function Λm is indeed generalized -submodular. Hence, by Theorem 8.1.1, Λm is generalized n-submodular. □
References
R. Aharoni, U. Keich, A generalization of the Ahlswede-Daykin inequality. Discrete Math. 152(1–3), 1–12 (1996)
R. Ahlswede, D.E. Daykin, Inequalities for a pair of maps S × S → S with S a finite set. Math. Z. 165(3), 267–289 (1979)
F. Bach, Learning with submodular functions: a convex optimization perspective. Found. Trends® Mach. Learn. 6(2–3), 145–373 (2013)
F. Bach, Submodular functions: from discrete to continuous domains. Math. Program. 175(1–2), 419–459 (2019)
C. Borell, A note on an inequality for rearrangements. Pac. J. Math. 47, 39–41 (1973)
G. Choquet, Theory of capacities. Ann. Inst. Fourier 5, 131–295 (1954)
H.A. David, H.N. Nagaraja, Order Statistics. Wiley Series in Probability and Statistics, 3rd edn. (Wiley, Hoboken, 2003)
C.M. Fortuin, P.W. Kasteleyn, J. Ginibre, Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 22, 89–103 (1971)
S. Fujishige, Submodular Functions and Optimization. Annals of Discrete Mathematics, vol. 47 (North-Holland, Amsterdam, 1991)
G. Grätzer, Lattice Theory: Foundation (Birkhäuser, Basel, 2011)
G.H. Hardy, J.E. Littlewood, G. Pólya, Inequalities, 2nd edn. (Cambridge University Press, Cambridge, 1952)
K. Joag-Dev, F. Proschan, Negative association of random variables, with applications. Ann. Stat. 11(1), 286–295 (1983)
S. Karlin, Y. Rinott, Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions. J. Multivariate Anal. 10(4), 467–498 (1980)
S. Karlin, Y. Rinott, Classes of orderings of measures and related correlation inequalities. II. Multivariate reverse rule distributions. J. Multivariate Anal. 10(4), 499–516 (1980)
D.E. Knuth, The Art of Computer Programming, vol. 3, 2nd edn. (Addison-Wesley, Reading, 1998). Sorting and searching [MR0445948]
G.G. Lorentz, An inequality for rearrangements. Am. Math. Mon. 60, 176–179 (1953)
A.W. Marshall, I. Olkin, B.C. Arnold, Inequalities: Theory of Majorization and Its Applications. Springer Series in Statistics, 2nd edn. (Springer, New York, 2011)
P. Milgrom, J. Roberts, Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica 58(6), 1255–1277 (1990)
H. Narayanan, Submodular Functions and Electrical Networks. Annals of Discrete Mathematics, vol. 54 (North-Holland, Amsterdam, 1997)
I. Pinelis, Optimal binomial, Poisson, and normal left-tail domination for sums of nonnegative random variables. Electron. J. Probab. 21(20), 19 (2016)
Y. Rinott, M. Saks, On FKG-type and permanental inequalities, in Stochastic Inequalities (Seattle, WA, 1991). IMS Lecture Notes Monograph Series, vol. 22 (Institute of Mathematical Statistics, Hayward, 1992), pp. 332–342
Y. Rinott, M. Saks, Correlation inequalities and a conjecture for permanents. Combinatorica 13(3), 269–277 (1993)
H.D. Ruderman, Two new inequalities. Am. Math. Mon. 59, 29–32 (1952)
D.M. Topkis, Equilibrium points in nonzero-sum n-person submodular games. SIAM J. Control Optim. 17(6), 773–787 (1979)
D.M. Topkis, Supermodularity and Complementarity. Frontiers of Economic Research (Princeton University Press, Princeton, 1998).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pinelis, I. (2019). Generalized Semimodularity: Order Statistics. In: Gozlan, N., Latała, R., Lounici, K., Madiman, M. (eds) High Dimensional Probability VIII. Progress in Probability, vol 74. Birkhäuser, Cham. https://doi.org/10.1007/978-3-030-26391-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-26391-1_8
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-030-26390-4
Online ISBN: 978-3-030-26391-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)