Optimal Partitions

Hwang, Frank K.; Rothblum, Uriel G.

doi:10.1007/978-1-4419-7997-1_40

Frank K. Hwang⁴ &
Uriel G. Rothblum⁵

7478 Accesses
1 Citations

Abstract

This chapter provides a survey on the problem of partitioning points in the d-dimensional space with the goal of maximizing a given objective function; the survey is based on the forthcoming book (Hwang FK, Rothblum UG (2011b) Partitions: optimality and clustering. World Scientific). The first two sections describe the framework, terminology, and some background material that are useful for the analysis of the partition problem. The middle two sections study the case of d = 1, and the last two sections study the case of general d. Results are presented without proofs, but with motivations, comments, and references. Complete proofs can also be found in the forthcoming book.

Work of Uriel G. Rothblum was supported by The Israel Science Foundation (Grant 669/06).

Access provided by Autonomous University of Puebla. Download reference work entry PDF

Combinatorial Optimization Approaches for Data Clustering

k Is the Magic Number—Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

A Selective Overview of Recent Advances in Spectral Clustering and Their Applications

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Formulation and Classification of Partition Problems

Consider a finite set $\mathcal{N}$ of distinct positive integers (for most of the development $\mathcal{N} =\{ 1,\ldots,\vert \mathcal{N}\vert \}$). A partition of $\mathcal{N}$ is a finite collection of sets $\pi = (\pi _{1},\ldots,\pi _{p})$ where $\pi _{1},\ldots,\pi _{p}$ are pairwise disjoint nonempty sets whose union is $\mathcal{N}$. In this case, p is called the size ofπ, and the sets $\pi _{1},\ldots,\pi _{p}$ are called the parts ofπ. Further, if $n_{1},\ldots,n_{p}$ are the sizes of $\pi _{1},\ldots,\pi _{p}$, respectively, the vector $(n_{1},\ldots,n_{p})$ is called the shape ofπ; of course, in this case, $\sum _{j=1}^{p}n_{j} = \vert \mathcal{N}\vert $. The prefix “p-” or “$(n_{1},\ldots,n_{p})$-” is sometimes added to explicitly express the size or shape of a partition, referring to a p-partition of $\mathcal{N}$ or to an $(n_{1},\ldots,n_{p})$-partition of $\mathcal{N}$. Further, for brevity, the reference to the set $\mathcal{N}$ as the partitioned set is frequently omitted, referring simply to partitions. In the forthcoming development, it is sometimes required that partitions’ parts are nonempty, while at other times, this requirement is relaxed.

Partition problems are (combinatorial) optimization problems in which a partition π is to be selected out of a given set Π so as to optimize (i.e., minimize or maximize) an objective function F that is defined over Π. Such problems are classified by (i) the set of partitions Π over which optimization takes place, (ii) the number of characteristics associated with each of the partitioned elements, and (iii) the objective function F that is defined over Π. It is next elaborated on each of these classifiers.

1.1 Sets of Partitions

At times, attention is restricted to the set of all partitions or to the set of all partitions whose size or shape satisfies prescribed restrictions; these situations are, respectively, referred to as open, constrained-size, and constrained-shapesets of partitions. Situations where the restrictions on the size or shape are expressed by prescribing a single element are referred to as single-size or single-shape sets of partitions, and situations where restrictions are expressed through lower and upper bounds on the size or shape are referred to as bounded-size or bounded-shape sets of partitions, respectively. All of the above classes of sets of partitions can be treated as special cases of a constrained-shape class. Their respective names simply emphasize the kind of constraints on the shapes. For example, the class of p-partitions collects those partitions with p (nonempty) parts, and the class of open partitions collects all partitions without restriction on the number of (nonempty) parts. Thus, constrained-shape class is the most general class. Still, the general framework of constrained-shape sets of partitions does not appear in the forthcoming development, and whenever constrained-shape partition problems are mentioned, all shapes have the same size; consequently, the terminology “constrained shape” is used for sets of partitions with restricted shapes that have the same size (though formally, these are single-size constrained-shape sets of partitions). Sometimes, the above adjectives will be used casually to refer to partitions that belong to corresponding sets.

Let the (partitioned) set $\mathcal{N}$ be given and let $n \equiv \vert \mathcal{N}\vert $. When a single-size set of partitions is considered, the prescribed single-size is given as a positive integer p. Given a positive integer p and a set Γ of positive integervectors $(n_{1},\ldots,n_{p})$, each satisfying $\sum _{j=1}^{p}n_{j} = n$, ${\Pi }^{\Gamma }$ will denote the corresponding constrained shape partitions, that is, all partitions with shape in Γ. In particular, if Γ consists of a single vector $(n_{1},\ldots,n_{p})$, the notation ${\Pi }^{(n_{1},\ldots,n_{p})}$ refers to (the single-shape set of partitions) Π ^Γ. Also, if L and U are nonnegative integer p-vectors satisfying L ≤ U and $\sum _{j=1}^{p}L_{j} \leq n \leq \sum _{j=1}^{p}U_{j}$, Γ ^{(L, U)} will denote the set of nonnegative integervectors $(n_{1},\ldots,n_{p})$ satisfying $L_{j} \leq n_{j} \leq U_{j}$ for $j = 1,\ldots,p$; in this case, the notation Π ^{(L, U)} is used for (the bounded-shape set of partitions) ${\Pi }^{{\Gamma }^{(L,U)} }$. (The restrictions on L and U assure that Γ ^{(L, U)} and Π ^{(L, U)} are nonempty.) Of course, single-size and single-shape sets of partitions are instances of bounded-shape sets obtained, respectively, by setting L _j = 1 and $U_{j} = \vert \mathcal{N}\vert - p + 1$ for all j or L _j = n for all j. Similarly, open and single-size sets partitions are instances of bounded-size sets. The hierarchy of the classification of partitions is summarized in Table 1.

A multiset is a group of elements where each element is allowed to have multiple occurrence. The formal notation of a multiset has double brackets, for example, {{1, 1, 2, 2, 3}}, or is given as a bracketed list of distinct elements with superscripts designating their duplications, for example, {1², 2², 3}. However, at times, (the abused notation of) single brackets is used, for example, {1, 1, 2, 2, 3}.

Table 1 Classification of sets of partitions

Full size table

It is implicitly assumed in the above definitions that the parts of partitions are distinguishable. But, in some applications, the parts are indistinguishable and can be permuted without any restrictions. These situations are referred to as unlabeled partitions. Formally, an unlabeled partition of $\mathcal{N}$ is a finite collection of sets $\pi =\{\pi _{1},\ldots,\pi _{p}\}$ where the π _j’s are as above. Again, p and the sets $\pi _{1},\ldots,\pi _{p}$ are referred to as the size and the parts ofπ, respectively. Further, if $n_{1},\ldots,n_{p}$ are the sizes of $\pi _{1},\ldots,\pi _{p}$, respectively, the shape ofπ is the multiset $\{\{n_{1},\ldots,n_{p}\}\}$; again, $\sum _{j=1}^{p}n_{j} = \vert \mathcal{N}\vert $ is required. In the literature, unlabeled partitions are commonly referred to as allocations. Herein, the term “partitions” is reserved to labeled ones; but, at times (when potential ambiguity may arise), there is reference to labeled partitions.

The classification to sets of labeled partitions applies to unlabeled ones; see Table 1. Single-shape and bounded-shape sets of unlabeled partitions are defined, respectively, by a multiset $\{\{n_{1},\ldots,n_{p}\}\}$ or a multiset of pairs $\{\{(L_{1},U_{1}),\ldots,(L_{p},U_{p})\}\}$; the specification or the bounds on the sizes of the parts of partitions then hold for some labeling of the parts. Frequently, as parts of unlabeled partitions are indistinguishable, sizes and bounds of unlabeled partitions are uniform, namely, all parts have the same size and a multiset of bounds $\{\{(L_{1},U_{1}),\ldots,(L_{p},U_{p})\}\}$ consists of p identical pairs.

1.2 The Number of Characteristics Associated with each Partitioned Elements

It is assumed throughout that each element i of the partitioned set $\mathcal{N}$ is associated with a vector ${A}^{i} \in {\mathbb{R}}^{d}$ where d is a fixed positive integer (independent of i); the coordinates of A ⁱ are referred to as parameters or characteristics associated with i. The n vectors A ⁱ are part of the data for the problem and are given in the form of a real d ×n matrix A. For a subset S of $\mathcal{N} =\{ 1,\ldots,n\}$, A ^S is the submatrix of A consisting of the columns of A indexed by S, ordered as in A. Also, “bars” over matrices are used to denote the multiset consisting of their columns, for example, a subset S of $\mathcal{N} =\{ 1,\ldots,n\}$, $\overline{{A}^{S}}$ is the set of columns of A ^S, accounting for multiplicities.

1.3 The Objective Function

An objective function F( ⋅) associates each (feasible) p-partition π with a value F(π), and F(π) is to be maximized (or minimized) over the given set of partitions. The value F(π) of a partition π is assumed to depend on the parameters of the elements that are assigned to each part. More specifically, for each positive integer v, there is a column-symmetric function $h_{v} : {\mathbb{R}}^{d\times v} \rightarrow {\mathbb{R}}^{d}$, defined over multisets of vd-vectors; there are functions $g_{j} : {\mathbb{R}}^{d} \rightarrow {\mathbb{R}}^{m}$, $j = 1,\ldots,p$, and a function $f_{p} : {\mathbb{R}}^{m\times p} \rightarrow \mathbb{R}$. Then the value F(π) associated with partition π having shape $(n_{1},\ldots,n_{p})$ is given by

$$F(\pi ) = f_{p}\left (g_{1}[h_{n_{1}}(\overline{{A}^{\pi _{1}}})],\ldots,g_{p}[h_{n_{ p}}(\overline{{A}^{\pi _{p}}})]\right ).$$

(1)

The functions $h_{n_{j}}$ can, in a more general context, depend on the location within the variables of f _p, that is, on the index j; also, the functions g _j may depend on n _j. In many common applications, each of the functions $h_{n_{j}}$ is the summation function; in such cases, the problems are called sum-partition problems. When h _v or g _j is independent of the indexing parameter, the corresponding subscripts are dropped. Also, when referring to partitions of common size, the subscript “p” of f _p is dropped. Functions f _p are called additive when f _p is the sum function. It is also possible to consider partition problems where the domain of the functions $h_{n_{j}}$ consists of ordered lists (and the functions $h_{n_{j}}$ are not symmetric).

Some further notation is next introduced for a more concise form of sum-partition problems. For a d ×n real matrix A and a p-partition $\pi = (\pi _{1},\ldots,\pi _{p})$ of $\mathcal{N}$, the π-summation-matrix of A, denoted A _π, is given by

$$A_{\pi } \equiv \left [\sum _{i\in \pi _{1}}{A}^{i},\ldots,\sum _{ i\in \pi _{p}}{A}^{i}\right ] \in {\mathbb{R}}^{d\times p},$$

(2)

where the empty sum is defined to be 0. When each of the g _j’s is the identity over ℝ^d, the objective function F associates with partition π the value F(π) with the representation

$$F(\pi ) = f_{p}(A_{\pi }).$$

(3)

(As was already mentioned, when the optimization over partitions concerns only partitions of fixed size p, the dependence of the functions f _p on p is suppressed).

Of particular interest is the case where all A ⁱ’s have a common coordinate, say the first one, and it equals 1 for each A ⁱ. In this case, row 1 of A _π is the shape of π. It follows that (3) allows for the objective function F to depend on the shape of partitions. Of course, for single-shape problems, the part-sizes (i.e., the coefficients of the shape) are fixed and can be viewed as parameters of the objective function.

In summary, the major classifiers of partition problems are:

(1)
The family of partitionsΠover which the functionF(with representation as in ( 1 ) or ( 3 )) is considered and optimized: Using the classification of families of partitions provided in Table 1, there are open, constrained-size, bounded-size, single-size, constrained-shape, bounded-shape, and single-shape partition problems. In addition, relaxed-size problems refer to single-size problems which allow for empty parts. The description of the set of feasible partitions has to specify whether or not empty parts are allowed.
(2)
The number of parameters associated with each of the partitioned elements: The forthcoming development refers to single-parameter problems, two-parameter problems, and multiparameter problems.
(3)
The objective (cost) functionFas expressed by ( 1 ): Adjectives like “sum-,” “max-,” or “mean-” of partition problems reflect properties of h _v while properties of f, like “linear,” “convex,” “Schur convex,” and “separable,” are explicitly referred to f _p, for example, sum-partition problems with f Schur convex.

A partition that maximizes/minimizes F( ⋅) over a prescribed family of partitions Π is called optimal overΠ.

For single-, bounded-, and constrained-shape families of partitions, the inclusion or exclusion of empty parts is implicit in the description of the set of feasible shapes – empty parts are prohibited when the set of feasible shapes consists only of positive vectors, whereas the inclusion of nonnegative shapes that are not (strictly) positive indicates that empty parts are allowed. In particular, results that apply to all single-, bounded-, or constrained-shape partition problems with empty parts allowed extend to corresponding problems with empty parts prohibited (by restricting attention to corresponding sets of shapes that consist only of positive vectors). Still, there are results in this chapter that apply to constrained-shape problems that require the exclusion of empty parts. For sets of partitions that are determined by restrictions on size, for example, the case of single-size, the inclusion or exclusion of empty parts, must be made explicit as neither variant captures the other; see the results of the forthcoming Sect. 3 (summarized Table 4) about the optimality of monopolistic partitions for size problems with empty parts allowed and the optimality of extremal partitions for size problems with empty parts prohibited. Default positions in different parts of this chapter differ – at times they allow for empty parts, at times they exclude them, and at times they allow either way (to be determined through the specification of the set of feasible shapes).

The above classification refers to labeled partitions. Optimization problems over sets of unlabeled partitions can be embedded in the above framework, with Π and F being invariant under part permutation.

A major goal of this chapter is the identification of properties that are present in optimal partitions, thereby allowing one to restrict attention to corresponding subclasses of partitions. These properties are usually defined in terms of geometric/algebraic characteristics of the set of vectors associated with the indices assigned to the parts of the underlying partition. In particular, the focus is on properties that capture “clustering” of these vectors. The number of single-shape partitions is exponential in n (which implies the number of size partitions, open partitions, and other varieties of partitions are all exponential in n), so that it is not feasible to enumerate efficiently the partitions for selecting the optimal one. On the other hand, if a property has only polynomially many members, then it is possible to find an optimal partition by enumerating the subclasses of partitions having this property and comparing the values of the objective function.

Formally, a propertyQ of a partition is a unitary relation over sets of partitions, and it can be identified with the sets of partitions that satisfy it. Properties that are considered when studying multiparameter problems depend on the A ⁱ’s; as such, it seems natural to consider the partitioning of the A ⁱ’s rather than the partitioning of the underlying index set. But, possible equality among the A ⁱ’s often obscures the partition property. This situation is easily handled when the partitions are of the set of distinct indices and not of the A ⁱ’s.

A type 1 result for a partition problem is one that establishes a property for every optimal partition; a type 2 result is one that establishes a property for some optimal partition.

The next lemma, observed by Golany et al. [29], records a useful implication for the presence of properties in optimal solutions for single-shape/size and constrained-shape/size partition problems. For simplicity, attention is restricted to partitions whose size p is fixed.

Lemma 1

Consider an objective function F over partitions, a set Γ of integer p-vectors whose coordinate-sum is n and a property Q of partitions such that each single-shape partition problem corresponding to a shape in Γ and the objective function F, Q is satisfied by some (every) optimal partition. Then, for each constrained-shape partition problem corresponding to a subset of Γ and the objective function F, Q is satisfied by some (every) optimal partition. Also, the above holds with “shape” replaced by “size.”

For single-parameter problems, the n elements in the single row of A are usually denoted ${\theta }^{1},\ldots {,\theta }^{n}$. In particular, for single-shape sum-partition problems, the π-summation matrix associated with a partition π is a p-vector, which is denoted θ _π and referred to as π-summation vector, that is,

$$\theta _{\pi } \equiv \left (\sum {_{i\in \pi _{1}}\theta }^{i},\ldots,\sum {_{ i\in \pi _{p}}\theta }^{i}\right ) \in {\mathbb{R}}^{1\times p}.$$

(4)

Evidently, when θ ⁱ = 1 for each i, θ _π is the shape of π. For single-parameter problems, the partitioned elements can be renumbered so that θ ⁱ is monotone in i, that is,

$${ \theta }^{1} \leq \cdots {\leq \theta }^{n}.$$

(5)

(Still, when analyzing the complexity of algorithm that solve partition problems, one should account for time required to sorting θ ⁱ’s, if needed).

Much of the analysis focuses on partition problems with fixed size (single-shape, bounded-shape, constrained-shape, and single-size problems). When considering problems that allow for varying partition sizes, the objective function is, effectively, parameterized by an integer p representing the partition sizes; see (1). In particular, results about optimal partitions will then depend on some structure that expresses a connection between the objective function of p-partitions for different values of p. The most natural structure is the “reduction assumption” that asserts that F is invariant under the permutation/elimination of empty parts. For example, this is the case when f _p in (1) is the sum- (max/min)-function and the value assigned to empty parts is 0 ( − ∞/∞). A natural assumption that accompanies the “reduction assumption” is symmetry of F under part permutations.

Open problems that do not satisfy the reduction assumption can be difficult to analyze: They may have infinitely many feasible partitions, and there need not be an optimal one; this situation is demonstrated in the next example.

Example 1

Let $\mathcal{N} =\{ 1,2\}$ and consider the open problem where empty sets are allowed and objective function is given by $F(\pi ) =\sum _{ j=1}^{p}j\left (\sum _{i\in \pi _{j}}i\right )$ for partitions π of size p. Then every single-size problem will assign both elements of $\mathcal{N}$ to the part with the highest index. But, the open problem does not have an optimal partition as every p-partition is dominated by the best (p + 1)-partition.

When the “reduction assumption” is satisfied, each partition with p nonempty parts can, effectively, be viewed as (one of several) p′-partitions (with empty sets allowed) for any p′ ≥ p. In particular, upper bounded-size problems with upper bound p and open problem can be embedded in single-size problems with, respectively, p or n as the prescribed size (and with empty parts allowed). The amount of computation of solving n-size problems may be high, but the reduction may be useful in establishing properties of optimal solutions. Further, in cases where an optimal solution for a single-size problem with empty parts excluded is simple, optimizing over the number of nonempty parts may be computationally tractable. When the reduction assumption is in effect, allowing for empty parts is a technical notion that helps us address bounded-size and open problems.

The following list provides a collection of applications of the optimal partition problem reported in the literature (number 11 has not yet appeared in the literature):

1.
Assembly of systems (Derman et al. [17]; El-Newerbi et al. [22, 23]; Du [19]; Du and Hwang [20]; Malon [49]; Hwang et al. [41]; Hwang and Rothblum [36])
2.
Group testing (Dorfman [18], Sobel and Groll [68], Gilstein [28], Pfeifer and Enis [59], Hwang et al. [41])
3.
Circuit card library (Kodes [47]; Garey et al. [27])
4.
Clustering (Fisher [24]; Boros and Hwang [7]; Golany et al. [29])
5.
Abstraction of finite state machines (Oikonomou and Kain [58], Oikonomou [56, 57]).
6.
Multischeduling (Mehta et al. [51] Rothkopf [62], Tanaev [69], Graham [31], Cody and Coffman [16], Chandra and Wang [13], Easton and Wong [21])
7.
Cashe assignment (Gal et al. [26])
8.
Blood analyzer problem (Nawijn [53])
9.
Joint replenishment of inventory (Chakravarty et al. [11, 12], Goya [30], Silver [67], Nocturne [55], Chakravarty [10], A.K. Chakravarty 1983, Coordinated multi-product purchasing inventory decisions with group discounts, unpublished manuscript; A.K. Chakravarty, 1982b, Consecutiveness rule for inventory grouping with integer multiple constrained group-review periods, unpublished manuscript)
10.
Statistical hypothesis testing (Neyman and Pearson [54])
11.
Nearest neighbor assignment
12.
Division of property (Granot and Rothblum [32])
13.
Consolidating farm land (Brieden and Gritzmann [8]; Borgwardt et al. [5])

2 Preliminaries: Optimality of Extreme Points

This section presents sufficient conditions for the optimality of vertices, conditions that unify classical quasi-convexity and of Schur convexity. Throughout, standard definitions and facts about polytopes and convex sets are used; in particular, the vertices of a polytope are its extreme points.

Let C be a convex subset of ℝ^p and let f : C → ℝ. The function f is (strictly) quasi-convex along a nonzero vector d ∈ ℝ^p, or briefly (strictly) d-quasi-convex, if the maximum of f over every line segment in C with direction d is attained (only) at one of the two endpoints of that line segment, possibly both. The function f is (strictly) quasi-convex along a set D ⊆ ℝ^p, or briefly (strictly) D-quasi-convex, if f is (strictly) quasi-convex along every nonzero vector d ∈ D. The function f is (strictly) quasi-convex if it is (strictly) ℝ^p-quasi-convex. Reference to (strict) directional quasi-convexity is an abbreviation of (strictly) quasi-convexity along sets. Recall that a function f is (strictly) convex on C if for every distinct x, y ∈ C and 0 < α < 1, $f[\alpha x + (1-\alpha )y] \leq (<)\ \alpha f(x) + (1-\alpha )f(y)$; of course, every (strictly) convex function is (strictly) quasi-convex.

The next lemma states a simple, but useful, characterization of directional quasi-convexity.

Lemma 2

Let C be a convex subset of ℝ^p , f : C → ℝ and d ∈ ℝ^p ∖{ 0}. Then the following are equivalent:

(a) f is d-quasi-convex.
(b) For each x ∈ C, the restriction of f to $C \cap \{ x +\gamma d :\gamma \in \mathbb{R}\}$ (when viewed as a function in the parameter γ) cannot decrease after an increase.

The behavior of a function, which is d-quasi-convex over line segments with direction d, can follow the patterns demonstrated in Figures (a), (b), or (c) of Fig. 1, but not (d) (where a line segment is identified over which the function does not attain a maximum at an endpoint).

Let C and D be subsets of ℝ^p where C is compact and convex. The set D is a through-set forC if for every point x ∈ C which is not an extreme point of C, there is a line segment Ł ⊆ C with direction in D such that x is in the relative interior of L. The following lemma identifies a necessary condition for through-sets. It also shows that this condition is sufficient for polytopes.

Lemma 3

Let C be a convex set in ℝ^p . Then a subset D of ℝ^p is a through-set for C only if D contains a direction for each edge of C. Further, if C is a polytope, then every subset of ℝ^p that contains a direction for each edge of C is a through-set for C.

There are convex sets which have no edges, for example, the ℓ ₂-unit ball in ℝ^p. As the empty set is obviously not a through-set for any convex set of positive dimension, the sufficiency condition of Lemma 2 does not generalize from polytopes to general convex sets.

Sufficient conditions for optimality of extreme points are next presented. The result was developed independently by Tardella [70] and Hwang and Rothblum [37]. It unifies the classical conditions of quasi-convexity and results about Schur convexity (see the paragraph following Corollary 2).

Theorem 1

Let C be a compact convex set in ℝ^p , let E be the set of extreme points of C, and let D be a through-set for C, and let f : C → ℝ be D-quasi-convex. Then $\sup _{C}f =\sup _{E}f$ ; in particular, if f is continuous or if C is a polytope, f attains a maximum over C at one of C’s extreme points.

Theorem 2

Let C be a compact convex set in ℝ^p , let D be a through-set for C, and let f : C → ℝ be strictly quasi-convex along D. Then every maximizer of f over C is an extreme point of C; in particular, if f is continuous or if C is a polytope, then such maximizers exist.

Let C be a convex subset of ℝ^p and let f : C → ℝ. The function f is (strictly) edge-quasi-convex on C if it is (strictly) quasi-convex along every (some) subset D of ℝ^p that contains a direction for each edge of C. Evidently, f is (strictly) edge-quasi-convex on C if and only if the maximum of f over any line segment in C which is parallel to one of C’s edges is attained at one of the two endpoints (and not at a point in the relative interior). Also, a (strictly) quasi-convex function is (strictly) edge-quasi-convex on every convex set over which it is defined.

Corollary 1

Let P be a polytope and let f : P → ℝ. If f is edge-quasi-convex on P, then f attains a maximum over P on one of P’s vertices. Further, if f is strictly edge-quasi-convex, then every maximizer of f over P is a vertex of P; in particular, such a maximizer exists.

The assumption in Corollary 1 depends on the polytope P through its edge-directions, and the conclusion of the corollary specifically refers to P.

For $k \in \{ 1,\ldots,p\}$, let e ^k be the k-th unit vector in ℝ^p. Also, let $\Lambda \equiv \{ (i,j) : i,j \in \{ 1,\ldots,p\}$ and i≠j}, and for (i, j) ∈ Λ, let ${e}^{ij} \equiv {e}^{i} - {e}^{j}$. Let C be a convex subset of ℝ^p and f : C → ℝ. The function f is called (strictly) asymmetric Schur convex if it is (strictly) quasi-convex along the set $\{{e}^{ij} : (i,j) \in \Lambda \}$. The use of the term “asymmetric” is to emphasize that these properties apply to functions which are not necessarily symmetric. Of course, every quasi-convex function, let alone a convex function, is asymmetric Schur convex. (See the forthcoming definition of Schur convex functions).

The next lemma provides a characterization of asymmetric Schur convexity of continuously differentiable functions.

Lemma 4

Let $C \subseteq {\mathbb{R}}^{p}$ be an open convex set of dimension p and let $f : C \rightarrow R$ be continuously differentiable. Then f is asymmetric Schur convex on C if and only if

$$\begin{array}{ll} \mbox{ for all }x \in {I}^{p}\mbox{ and }i,j = 1,\ldots,p,\ \left [ \frac{\partial f} {\partial x_{i}} - \frac{\partial f} {\partial x_{j}}\right ][x +\gamma ({e}^{i} - {e}^{j}] \\ \mbox{ cannot change sign from positive to negative as $\gamma $ increases.} \end{array}$$

(6)

For a vector x ∈ ℝⁿ and $k = 1,\ldots,n$, let x _[k] be the k-th largest coordinate of x. A vector a ∈ ℝ^p majorizes a vector b ∈ ℝ^p, written a ≻ b, if

$$\sum _{i=1}^{k}a_{ [i]} \geq \sum _{i=1}^{k}b_{ [i]}\ \mbox{ for all }\ k = 1,\ldots,p$$

(7)

and

$$\sum _{i=1}^{p}a_{ [i]} =\sum _{ i=1}^{p}b_{ [i]}.$$

(8)

Evidently, (7) and (8) are, respectively, equivalent to

$$\max _{\vert I\vert =k}\sum _{i\in I}a_{i} \geq \max _{\vert I\vert =k}\sum _{i\in I}b_{i}\ \mbox{ for all }\ k = 1,\ldots,p $$

(7a)

and

$$\sum _{i=1}^{p}a_{ i} =\sum _{ i=1}^{p}b_{ i}. $$

(8a)

The vector a strictly majorizes b if a majorizes b, but b does not majorize a.

Let B be a convex subset of ℝ^p and let $f : B \rightarrow \mathbb{R}$. The function f is called Schur convex on B if

$$f(a) \geq f(b)\mbox{ for all }a,b \in B\mbox{ satisfying }a \succ b.$$

(9)

Condition (9) asserts that f is order-preserving with respect to the partial order majorization. The function f is called strictly Schur convex if it is Schur convex and (9) holds strictly whenever a strictly majorizes b.

Asymmetric Schur convexity is next related to classical Schur convexity. A subset B of ℝ^p is symmetric if B contains all vectors obtained by permuting the coordinates of vectors in B. A real-valued function on a symmetric subset of ℝ^p is called symmetric if it is invariant under coordinate-permutations.

Theorem 3

Let I ⊆ ℝ be an open interval and f : I ^p → ℝ. Then the following are equivalent:

(a) f is (strictly) Schur convex on I ^p.
(b) f is symmetric and (strictly) asymmetric Schur convex on I ^p.

As every (strictly) convex function is clearly (strictly) asymmetric Schur convex, Theorem 3 implies that a symmetric (strictly) convex function is necessarily (strictly) Schur convex.

The next result relates asymmetric Schur convexity to edge quasi-convexity. Let ℝ₊ be the nonnegative real numbers.

Corollary 2

Let $f : {(\mathbb{R}_{+})}^{p} \rightarrow \mathbb{R}$ . Then f is asymmetric Schur convex if and only if for every β > 0 the restriction of f to the polytope $\left \{x \in {\mathbb{R}}^{p} : x \geq 0,\,\sum _{i=1}^{p}x_{i} =\beta \right \}$ is edge-quasi-convex.

Evidently, quasi-convex functions and Schur convex functions are asymmetric Schur convex. Thus, asymmetric Schur convexity unifies classical quasi-convexity and Schur convexity. The following example identifies functions that are asymmetric Schur convex but neither convex nor symmetric, let alone Schur convex.

Example 2

Let f : ℝ² → ℝ be given by $f(x_{1},x_{2}) = g(x_{1} + x_{2}) + h(x_{1} - x_{2})$ with g arbitrary and h convex. To see that f is asymmetric Schur convex consider a line with direction e ¹ − e ², say $\{x +\gamma ({e}^{1} - {e}^{2}) :\gamma \in \mathbb{R}\}$. It then follows that $f[x +\gamma ({e}^{1} - {e}^{2})] = g(x_{1} + x_{2}) + h(x_{1} - x_{2} + 2\gamma )$, and this expression is clearly convex in γ. The function f need not be symmetric nor convex, for example, $f(x) =\sin (x_{1} + x_{2}) + {(x_{1} - x_{2})}^{2}$.

Theorem 4

Let P be a polytope all of whose edges have direction in {e ^ij : (i,j) ∈ Λ} and let $f : P \rightarrow \mathbb{R}$ . If f is asymmetric Schur convex on P, then f attains a maximum over P at one of P’s vertices. Further, if f is strictly asymmetric Schur convex, then every maximizer of f over P is a vertex of P; in particular, such a maximizer exists.

Polytopes of the form {x ∈ ℝ^p : x ≥ 0, $\sum _{i=1}^{p}x_{i} =\beta \}$ with β ≥ 0 clearly satisfy the assumption of Theorem 4 about the direction of their edges. Theorem 4 follows Hwang and Rothblum [37]. It streamlines the classical result about quasi-convexity being a sufficient condition for optimality of extreme points with a corresponding result about Schur convex functions. Interestingly, historically, these two conditions were considered distinct; in fact, Marshall and Olkin [50, p. 13] refer to the use of “convexity” in describing Schur convex functions as “inappropriate” and state that they adhere to the “historically accepted term ‘Schur convexity’.” Figure 2 below provides a graphical illustration of the relationships between quasi-convexity, directional quasi-convexity, classical Schur convexity (for symmetric functions), and asymmetric Schur convexity.

3 Single-Parameter: Polyhedral Approach

This section examines single-parameter partition problems. In particular, a class of polytopes, called partition polytopes, is introduced, and these polytopes are used to study partition problems.

Throughout this section and the next one, let (the partitioned set) $\mathcal{N} =\{ 1,\ldots,n\}$, (the size of partitions) p and (the real numbers associated with the partitioned elements) ${\theta }^{1},\ldots {,\theta }^{n}$ be fixed. Without loss of generality, assume that the θ ⁱ’s are ordered and satisfy

$${ \theta }^{1} {\leq \theta }^{2} \leq \cdots {\leq \theta }^{n}\ ;$$

(10)

also, θ will stand for the vector ${(\theta }^{1},\ldots {,\theta }^{n})$.

Recall from Sect. 1 that for a p-partition $\pi = (\pi _{1},\ldots,\pi _{p})$ of $\mathcal{N}$, θ _π is given by

$$\theta _{\pi } \equiv \left (\theta _{\pi _{1}},\ldots,\theta _{\pi _{p}}\right ) = \left (\sum {_{i\in \pi _{1}}\theta }^{i},\ldots,\sum {_{ i\in \pi _{p}}\theta }^{i}\right ) \in {\mathbb{R}}^{p}.$$

(11)

Given a set Π of ordered p-partitions, the Π-partition polytope is denoted ${P}^{\Pi } \equiv \mbox{ conv}\,\{\theta _{\pi } :\pi \in \Pi \} \subseteq {\mathbb{R}}^{p}$; when the set Π is either generic or clear from the context, the prefix “Π − ” is omitted. In forthcoming sections, partition problems are examined where the partitioned elements are associated with vectors rather than scalars; in the broader context, the above polytopes are called single-parameter partition polytopes.

The abbreviated notation P ^Γ will be used for ${P}^{{\Pi }^{\Gamma } }$ when Γ is a set of nonnegative integer p-vectors $(n_{1},\ldots,n_{p})$ satisfying $\sum _{j=1}^{p}n_{j} = n$, and such a polytope will be referred to as a constrained-shape partition polytope. Further, if Γ consists of a single vector $(n_{1},\ldots,n_{p})$, ${P}^{(n_{1},\ldots,n_{p})}$ will stand for ${P}^{{\Pi }^{(n_{1},\ldots,n_{p})} }$ and such a polytope will be referred to as a single-shape partition polytope; if L and U are nonnegative integer p-vectors satisfying L ≤ U and $\sum _{j=1}^{p}L_{j} \leq n \leq \sum _{j=1}^{p}U_{j}$, P ^{(L, U)} will stand for ${P}^{{\Pi }^{(L,U)} }$ and such a polytope will be referred to as a bounded-shape partition polytope; finally, if Π consists of all partitions of size p, P ^p will stand for ${P}^{{\Pi }^{p} }$ and such a polytope will be referred to as a single-size partition polytope.

The vertices of a Π-partition polytope P ^Π have a representation θ _π with π ∈ Π. Thus, in seeking an optimal partition in Π when the objective function F has the representation F(π) = f(θ _π) for each π ∈ Π, it is useful to examine an extension of the functions f to P ^Π (from {θ _π : π ∈ Π}) and use results of Sect. 2 to determine conditions that suffice for this extension to attain a maximum over P ^Π at a vertex.

Consider a single-shape problem corresponding to shape. If n _j = 0 for some $j = 1,\ldots,p$, then for every partition $\pi \in {\Pi }^{(n_{1},\ldots,n_{p})}$ π _j is empty and (θ _π)_j = 0; so, part j is redundant. Consequently, such indices j can be eliminated, and there is no loss of generality by imposing the restriction that all n _j’s are positive. But, such a reduction needs not be possible when considering constrained-shape problems. The analysis of partition polytopes usually allows for empty parts (diverting from the default position that empty parts are prohibited).

The following example identifies an important class of partition polytopes.

Example 3

Consider the single-shape partition polytope defined by $(n_{1},\ldots,n_{p}) = (1,\ldots,1)$. In particular, in this case, n = ∑ _{j = 1} ^p 1 = p. A partition π with shape $(1,\ldots,1)$ corresponds to a permutation of $(1,\ldots,p)$, and for such a partition, θ _π is the p-vector obtained from the corresponding coordinate permutation of ${(\theta }^{1},\ldots {,\theta }^{p})$. The partition polytope ${P}^{(1,\ldots,1)}$ is then the convex hull of these (permuted) vectors, and it is called the generalized permutahedron corresponding to ${(\theta }^{1},\ldots {,\theta }^{p})$. The standard permutahedron is the generalized permutahedron with θ ⁱ = i for $i = 1,\ldots,p$.

Generalized permutahedra were first investigated by Schoute [63]; see also Ziegler [71, pp. 17–18 and 23].

A set of partitions Π is next associated with two additional polytopes; it will be shown that the definition of these polytopes provides useful representations for partition polytopes under general conditions. First, let C ^Π be the solution set of the following system of linear inequalities:

$$\begin{array}{rcl} \sum _{j\in I}x_{j}& \geq & \min \left \{\sum _{j\in I}(\theta _{\pi })_{j} :\pi \in \Pi \right \}\mbox{ for each }\varnothing \subset I \subseteq \{ 1,\ldots,p\}\quad \ \end{array}$$

(12)

$$\begin{array}{rcl} \sum _{j=1}^{p}x_{ j}& =& \sum _{i=1}^{n}{\theta }^{i};\end{array}$$

(13)

evidently, C ^Π is bounded and it is therefore a polytope.

An important property of partitions is next introduced; it will then be used to define the third polytope associated with a set of partitions. A partition is called consecutive if each of its parts consists of consecutive integers. Such a partition is determined by an order on its (nonempty) parts with a part preceding another when the indices of each element in the first part are smaller than each element in the second part. When the parts of a partition are nonempty, the order defining a consecutive partition is unique; further, a consecutive partition is uniquely determined by its order and its shape. In particular, for every nonnegative integer vector $(n_{1},\ldots,n_{p})$, there exists a consecutive partition π with shape $(n_{1},\ldots,n_{p})$. The third polytope associated with a set of partitions Π, denoted H ^Π, is defined by ${H}^{\Pi } \equiv $ conv{θ _π : π is a consecutive partition in Π}.

When $\Pi \equiv {\Pi }^{(n_{1},\ldots,n_{p})}$ or Π = Π ^{(L, U)}, superscripts $(n_{1},\ldots,n_{p})$ and (L, U) are used for ${\Pi }^{(n_{1},\ldots,n_{p})}$ and Π ^{(L, U)}, respectively, to index the polytope C and H, for example, ${C}^{(n_{1},\ldots,n_{p})}$ and H ^{(L, U)}. The next result relates the polytopes P ^Π, C ^Π, and H ^Π for sets of partitions Π.

Lemma 5

For every set Π of p-partitions, ${H}^{\Pi } \subseteq {P}^{\Pi } \subseteq {C}^{\Pi }$.

In order to explore conditions under which the inclusions in Lemma 5 hold as equalities (yielding useful representations for partition polytopes), two polytopes associated with a real-valued functions on subsets are next introduced.

Let λ be a real-valued function on the subsets of $\{1,\ldots,p\}$ with $\lambda (\varnothing ) = 0$. A permutation (of $\{1,\ldots,p\}$) is formally defined as an ordered collection of sets $\sigma = (\sigma _{1},\ldots,\sigma _{p})$ where $\sigma _{1},\ldots,\sigma _{p}$ are singletons that partition $\{1,\ldots,p\}$; given such a partition σ and $k \in \{ 1,\ldots,p\}$, there is a unique index j with σ _j = { k} – it is denoted j _σ(k). Each permutation σ defines a vector $\lambda _{\sigma } \in {\mathbb{R}}^{p}$ whose k-th coordinate (λ _σ)_k for $k = 1,\ldots,p$ equals $\lambda (\cup _{t=1}^{j}\sigma _{t}) -\lambda (\cup _{t=1}^{j-1}\sigma _{t})$ with j = j _σ(k) (i.e., j as the unique index satisfying σ _j = { k}). The permutation polytope corresponding toλ, denoted H ^λ, is the convex hull of the vectors λ _σ with σ ranging over all permutations of $\{1,\ldots,p\}$. A second polytope associated with λ, denoted C ^λ, is the solution set of the system of linear inequalities given by

$$\begin{array}{rcl} \sum _{j\in I}x_{j}& \geq & \lambda (I)\ \mbox{ for each nonempty subset }I\mbox{ of }\{1,\ldots,p\},\end{array}$$

(14)

$$\begin{array}{rcl} \sum _{j=1}^{p}x_{ j}& =& \lambda (\{1,\ldots,p\}).\end{array}$$

(15)

In the game theoretic literature, ${C}^{\lambda }$ is referred to as the core of the p-person game defined by $\lambda$. Under this interpretation, $1,\ldots,p$ are considered players, subsets of $\{1,\ldots,p\}$ are called coalitions, $\lambda (I)$ for a coalitionI is the value of I, and the vectors in ${C}^{\lambda }$ represent allocations to the players with (14) assuring that the players in coalition I get together at least $\lambda (I)$ and (15) assuring that all players together get the value of the grand coalition $\{1,\ldots,p\}$. These games and their cores were introduced by Morgenstern and Von Neuman [52] and explored in Shapley [66].

A real-valued function λ on the subsets of $\{1,\ldots,p\}$ with $\lambda (\varnothing ) = 0$ is called supermodular if for every pair I and J of subsets of $\{1,\ldots,p\}$,

$$\lambda (I \cup J) +\lambda (I \cap J) \geq \lambda (I) +\lambda (J);$$

(16)

$\lambda$ is called strictly supermodular if strict inequality holds whenever the two sets I and J are not ordered by set inclusion, that is, I ⊈ J and J ⊈ I. Supermodular/submodular functions have extensive applications in optimization and polyhedral combinatorics.

The p standard unit vectors (in ℝ^p) will be denoted ${e}^{1},\ldots,{e}^{p}$. Parts (a) and (b) of the next theorem are due to Shapley [66], whereas part (c) is due to Hwang, Lee, and Rothblum [46].

Theorem 5

Suppose λ is supermodular on the subsets of $\{1,\ldots,p\}$. Then:

(a) ${H}^{\lambda } = {C}^{\lambda }$
(b) For v ∈ℝ^p , the following are equivalent:
1. (i)
  v is a vertex of ${H}^{\lambda }$
2. (ii)
  There is a permutation σ of $\{1,\ldots,p\}$ with v = λ _σ;
in particular, ${H}^{\lambda }$ has at most p ! vertices. Further, if $\lambda$ is strictly supermodular, then the correspondence between vertices of ${H}^{\lambda }$ and permutations of $\{1,\ldots,p\}$ is one to one.
(c) For distinct vertices v and v′ of ${H}^{\lambda }$ , the following are equivalent:
1. (i)
  conv{v,v′} is an edge of ${H}^{\lambda }$
2. (ii)
  There exist permutations σ and σ′ of $\{1,\ldots,p\}$ such that $\{v,v'\} =\{\lambda _{\sigma },\lambda _{\sigma '}\}$ and σ and σ′ coincide on all but exactly two parts which are indexed by consecutive integers under both $\sigma$ and $\sigma '$.
Further, if the above equivalent conditions hold and j and k are the elements in the noncoinciding parts of the permutations corresponding to v and v′, then v − v′ is a scalar multiple of (e ^j − e ^k ).

Theorem 5 will be used to provide conditions that suffice for having the inclusions in Lemma 5 hold as equalities. For that purpose, for each set of partition Π, define the function $\theta _{{\ast}}^{\Pi }$ on the subsets of $\{1,\ldots,p\}$ by setting

$$\left (\theta _{{\ast}}^{\Pi }\right )(I) \equiv \min \left \{\sum _{ j\in I}\left (\theta _{\pi }\right )_{j} :\pi \in \Pi \right \}\ \ \text{ for each }\ I \subseteq \{ 1,\ldots,p\};$$

(17)

in particular,

$$\left (\theta _{{\ast}}^{\Pi }\right )(\{1,\ldots,p\}) =\sum _{ i=1}^{n}{\theta }^{i}.$$

(18)

Trivially, ${C}^{\Pi } = {C}^{\theta _{{\ast}}^{\Pi } }$.

The set of partitions Π is called complete if for every permutation σ of $\{1,\ldots,p\}$, there exists a partition π in Π with

$$\theta _{\pi } = (\theta _{{\ast}}^{\Pi })_{\sigma }.$$

(19)

Note that for a permutation $\sigma = (\{j_{1}\},\ldots,\{j_{p}\})$ of $\{1,\ldots,p\}$ and $t = 1,\ldots,p$,

$$\sum _{s=1}^{t}\left [\left (\theta _{ {\ast}}^{\Pi }\right )_{\sigma }\right ]_{ j_{s}} = \left (\theta _{{\ast}}^{\Pi }\right )(\{j_{ 1},\ldots,j_{t}\}) =\min _{\tau \in \Pi }\sum _{s=1}^{t}\left (\theta _{\tau }\right )_{ j_{s}};$$

hence, $\theta _{\pi } = (\theta _{{\ast}}^{\Pi })_{\sigma }$ for a partition π, if and only if

$$\sum _{s=1}^{t}\left (\theta _{\pi }\right )_{ j_{s}} =\min _{\tau \in \Pi }\sum _{s=1}^{t}\left (\theta _{\tau }\right )_{ j_{s}}\mbox{ for each }t = 1,\ldots,p.$$

It is emphasized that completeness depends both on Π and on the θ ⁱ’s.

The vector θ is called one-sided, if all θ ⁱ’s are nonnegative or if all of them are nonpositive. Also, a constrained shape set of partitions is called single symmetric shape if the corresponding set of shapes consist of all coordinate-permutations of a single shape. The next result follows Hwang and Rothblum [39] in identifying several families of complete sets of partitions.

Theorem 6

Every single-shape set of partitions is complete. If θ is one-sided, then every bounded-shape, single-size with empty parts allowed or prohibited, and single symmetric shape are complete.

The following result is fundamental to the forthcoming development.

Theorem 7

Let Π be a complete set of partitions. Then:

(a) θ _∗ ^Π is supermodular.
(b) ${P}^{\Pi } = {C}^{\Pi } = {H}^{\Pi } = {H}^{\theta _{{\ast}}^{\Pi } }$.
(c) The direction of each edge of P ^Π is the difference of two unit vectors.
(d) v is a vertex of P ^Π if and only if $v = (\theta _{{\ast}}^{\Pi })_{\sigma }$ for some permutation σ of $\{1,\ldots,p\}$ , and in this case, v = θ _π for some consecutive partition π.
(e) If no partition in Π has an empty set and the θ ⁱ ’s are distinct, then θ _∗ ^Π is strictly supermodular, and for each vertex v, there is a unique permutation σ of $\{1,\ldots,p\}$ satisfying $v = (\theta _{{\ast}}^{\Pi })_{\sigma }$ ; in particular, P ^Π has exactly p! vertices.
(f) If Π is constrained-shape, the θ ⁱ’s are distinct, and either all θ ⁱ ’s are positive or they are all negative, then there is a unique partition π with θ _π = v.

Theorem 7 shows that when the set of partitions Π is complete, the edge-directions of P ^Π are differences of unit vectors, assuring (by Theorem 4) that asymmetric Schur convex functions over P ^Π attain a maximum at a vertex. The next result states these conclusions formally, providing conditions for some (every) optimal partition to have its associated vector a vertex of the corresponding partition polytope.

Theorem 8

Suppose Π is a set of p-partitions and f : P ^Π $\rightarrow \mathbb{R}$ where either f is quasi-convex, or Π is complete, and f is asymmetric Schur convex. Then:

(a) A partition π ^∗ is optimal if and only if $\theta _{{\pi }^{{\ast}}}$ maximizes f over P^Π.
(b) There exists a partition π ^∗ which is optimal over Π and has $\theta _{{\pi }^{{\ast}}}$ as a vertex of P ^Π.
(c) If f satisfies the corresponding property strictly on P ^Π , then every partition π that is optimal over Π has θ _π as a vertex of P^Π.

Theorem 8 provides a tool for establishing properties of optimal partitions for a wide range of partition problems by identifying properties of partitions whose associated vectors are vertices of corresponding partition polytopes. Constrained shape problems are considered first, establishing the optimality of consecutive partitions.

Theorem 9

Suppose Π is a constrained-shape set of partitions and $f : {P}^{\Pi } \rightarrow \mathbb{R}$ is asymmetric Schur convex. Then:

(a) There exists a partition which is optimal and consecutive; if furthermore either f is quasi-convex or Π is complete, then there exists a partition π which is optimal and consecutive and has θ _π as a vertex of P ^Π.
(b) If the θ ⁱ ’s are distinct and f is strictly asymmetric Schur convex, then every optimal partition is consecutive; if furthermore either f is strictly quasi-convex or Π is complete, then every optimal partition π is consecutive and has θ _π as a vertex of P ^Π.

Jointly, Theorems 8 and 9 present ten(!) results about optimal partitions – five providing conditions under which some optimal partition π is consecutive and/or has θ _π as a vertex of the partition polytope and five providing conditions under which every optimal partition has those properties. The “some” results are useful as they facilitate the restriction of the search for an optimal solution to subclasses of all partitions. When the set of partitions is complete, vertex enumeration is particularly efficient as the partition polytope is a permutation polytope having at most p! vertices. The “some” results of Theorems 8 and 9 are summarized in Table 2. In this table (and the forthcoming ones in this section), “${\Pi }^{\Gamma }$” under the “Π”-column means that Π is constrained-shape, “q-c” stands for quasi-convex, “(a-)S-c” stands for (asymmetric) Schur convex, an empty entry in the θ-column reflects no restrictions on θ, and a “ + ” in the “P ^Π perm”-column means P ^Π is a permutation polytope. Other abbreviations are self-explanatory (e.g., “cons.” stands for “consecutive”).

Table 2 Properties of optimal partitions

Full size table

Additional partition problems are next explored – their analysis requires additional definitions. A partition is size-consecutive (reverse size-consecutive) if it is consecutive and the larger elements are assigned to the bigger (smaller) parts. Also, a set of shapes is called symmetric if it contains every coordinate permutation of each of its members.

Theorem 10

Suppose Γ is a symmetric set of shapes, Π = Π ^Γ, θ ≥ 0 (θ ≤ 0) and f : P ^Π → ℝ is asymmetric Schur convex. Then:

(a) There exists a partition which is optimal and size-consecutive (reverse size-consecutive); if furthermore either f is quasi-convex or Π is complete, then there exists a partition π which is optimal and size-consecutive (reverse size-consecutive), and has θ _π as a vertex of P ^Π.
(b) If the θ ⁱ ’s are distinct and f is strictly asymmetric Schur convex, then every optimal partition is size-consecutive (reverse size-consecutive); if furthermore either f is strictly quasi-convex or Π is complete, then every optimal partition π is size-consecutive (reverse size-consecutive) and has θ _π as a vertex of P ^Π.

Theorem 11

Suppose Π is a constrained-shape set of partitions corresponding to a set of partitions Γ, θ ≥ 0 (θ ≤ 0) and f : ℝ^p → ℝ is Schur convex. Then:

(a) There exists a partition which is optimal and size-consecutive (reverse size-consecutive); if furthermore Π is complete, then there exists a partition π which is optimal, size-consecutive (reverse size-consecutive), and has θ _π as a vertex of P ^Π.
(b) If the θ ⁱ ’s are distinct and f is strictly Schur convex, then every optimal partition is size-consecutive (reverse size-consecutive); if furthermore Π is complete, then every optimal partition π is size-consecutive (reverse size-consecutive) and has θ _π as a vertex of P ^Π.

Theorem 11 does not include a variant that parallels the “quasi-convexity” of Theorem 10 as an alternative to completeness that allows one to augment (reverse) size-consecutiveness of optimal partitions with the property that θ _π is a vertex of P ^Π. Situations where the last property can be used to accelerate the search for an optimal partition when Π is not complete are not known.

Table 2 is next augmented with the “some” results of Theorems 10 and 11 into Table 3.

Theorem 12

Suppose Π is a bounded-shape set of partitions, f : P ^Π → ℝ is asymmetric Schur convex. Then (i) there exists a partition π which is optimal, consecutive, and has θ _π as a vertex of P ^Π; (ii) if the θ ⁱ ’s are distinct and f is strictly asymmetric Schur convex, then every optimal partition π is consecutive and has θ _π as a vertex of P ^Π. Further, if f is Schur convex and θ ≥ 0 ( θ ≤ 0), “consecutive” can be tightened to “size-consecutive (reverse size-consecutive).”

Of course, the conclusions of Theorem 12 apply to single-shape problems which are instances of bounded-shape problems.

A bounded-shape set of partitions is called uniform bounded-shape if both L and U are multiples of the vector $(1,\ldots,1)$ (i.e., the lower/upper bounds of all parts coincide).

Theorem 13

Suppose Π is a uniform bounded-shape set of partitions, $f :\ {P}^{\Pi }\ \rightarrow \ \mathbb{R}$ is asymmetric Schur convex and θ ≥ 0 (θ ≤ 0). Then (i) there exists a partition π which is optimal, size-consecutive (reverse size-consecutive), and has θ _π as a vertex of P ^Π; (ii) if the θ ⁱ ’s are distinct and f is strictly asymmetric Schur convex, then every optimal partition π is size-consecutive (reverse size-consecutive) and has θ _π as a vertex of P ^Π.

An important instance of Theorem 13 is the case where Π is single-symmetric shape.

The following additional properties of partitions are relevant only for size problem since the sets of shapes meeting the requirements of the definitions are severely restricted.

Table 3 Properties of optimal partitions (continued)

Full size table

A (reverse) size-consecutive partition π with all parts, but one, consisting of a single element is called (reverse) extremal; if all parts, but one, are empty, π is called monopolistic. A consecutive partition is called bipolar if either of the following conditions holds: (i) It has two nonempty parts, one containing all indices i with θ ⁱ < 0 and the other containing all indices i with θ ⁱ > 0, and the two parts arbitrarily split the indices with θ ⁱ = 0, or alternatively (ii) if either θ ≥ 0 or θ ≤ 0 and the partition is monopolistic. Evidently, if all θ ⁱ’s are positive or all of them are negative, a partition is bipolar if and only if it is monopolistic. Finally, the definition of the next property is lengthy. For p > 1, a consecutive partition π is called polarized-extremal if it has two distinguished parts, say $\pi _{j_{\oplus }}$ and $\pi _{j_{\ominus }}$, such that:

(i)
1. (a)
  If $i \in \pi _{j_{\oplus }}$ and u > i, then $u \in \pi _{j_{\oplus }}$.
2. (b)
  If $i \in \pi _{j_{\ominus }}$ and u < i, then $u \in \pi _{j_{\ominus }}$.
(ii)
1. (a)
  If $\pi _{j_{\oplus }}$ is not a singleton, then $\pi _{j_{\oplus }} \subseteq \{ i \in \mathcal{N} {:\theta }^{i} \geq 0\}$.
2. (b)
  If $\pi _{j_{\ominus }}$ is not a singleton, then $\pi _{j_{\ominus }} \subseteq \{ i \in \mathcal{N} {:\theta }^{i} \leq 0\}$.

(iii)
If $j\notin \{j_{\oplus },j_{\ominus }\}$, then π _j is a singleton; further, if either $\pi _{j_{\oplus }}$ is not a singleton and θ ^u = 0 for some $u \in \pi _{j_{\oplus }}$, or $\pi _{j_{\ominus }}$ is not a singleton and θ ^u = 0 for some $u \in \pi _{j_{\ominus }}$, then the single element in π _j, say i, has θ ⁱ = 0.

When θ ≥ 0 (θ ≤ 0), every extremal (reverse-extremal) partition is polarized-extremal and when θ > 0 (θ < 0), a partition is extremal (reverse-extremal) if and only if it is polarized-extremal.

Example 4

Let n = 7 and θ = ( − 3, − 2, − 1, 0, 0, 1, 2). Then ({1, 2, 3}, {4}, {5, 6, 7}) and ({1, 2}, {3}, {4}, {5}, {6, 7}) are polarized-extremal partitions, but ({1, 2}, {3}, {4}, {5, 6, 7}) is not.

Theorem 14

Suppose Π is a single-size set of partitions with empty parts allowed, $f : {P}^{\Pi } \rightarrow \mathbb{R}$ is asymmetric Schur convex and θ ≥ 0 (θ ≤ 0). Then (i) there exists a partition π which is optimal, monopolistic, and has θ _π as a vertex of P ^Π; (ii) if the θ ⁱ ’s are distinct and f is strictly asymmetric Schur convex, then every optimal partition π is monopolistic and has θ _π as a vertex of P ^Π.

Attention is next turned to the partition problems considered in Theorem 14 without the restriction that the θ ⁱ’s are one-sided.

Theorem 15

Suppose Π is a single-size set of partitions with empty parts prohibited and $f : {P}^{\Pi } \rightarrow \mathbb{R}$ is quasi-convex. Then (i) there exists a partition π which is optimal, polarized-extremal, and has θ _π as a vertex of P ^Π; (ii) if the θ ⁱ ’s are distinct and nonzero and f is strictly quasi-convex, then every optimal partition π is polarized extremal and has θ _π as a vertex of P ^Π.

When θ ≤ 0 or θ ≥ 0, the conclusions of Theorem 15 follow from Theorem 14.

Theorem 16

Suppose Π is a single-size set of partitions with empty parts allowed and $f : {P}^{\Pi } \rightarrow \mathbb{R}$ is asymmetric Schur convex. Then (i) there exists a partition π which is optimal, bipolar, and has θ _π as a vertex of P ^Π; (ii) if the θ ⁱ ’s are distinct and nonzero and f is strictly asymmetric Schur convex, then every optimal partition π is bipolar and has θ _π as a vertex of P ^Π. Further, if θ ≥ 0 or θ ≤ 0, then “bipolar” in the above conclusions can be replaced by “monopolistic.”

Sections 2 and 3 are next augmented with a summary of the “some” results of Theorems 12–16 (Table 4).

Table 4 Properties of optimal partitions (continued)

Full size table

4 Single-Parameter: Combinatorial Approach

The main goal in this section is to develop combinatorial tools for establishing that given a partition property Q and a set of partitions, some (every) partition in the set satisfies Q. Such results are useful for the analysis of partition problems when the set consists of all optimal partitions for a particular partition problem. The tool that is developed, referred to as local sorting, is to move from a partition in the given family that does not satisfy Q to a partition (in the family) that does by recursively rearranging (the elements in) a small number of parts at each step. Two conditions must be observed to guarantee that successive (local) sorting will produce a partition satisfying the given property Q have two parts: first, in each step, the sorting must produce a partition in the given set; and second, in order to avoid being trapped in a cycle, some statistic must decrease (or increase) monotonely as the sorting progresses. The first condition concerns the set of partitions, while the other concerns property Q. A useful approach of local sorting is the intuitive one which requires that the (sub)partition consisting of the sorted parts satisfies Q. These techniques will be applied to some specific optimization problems later, demonstrating the existence of optimal partitions with particular properties.

Some properties of partitions were introduced in Sect. 3. These include (with abbreviations added in parenthesis) consecutiveness (C), size-consecutiveness (S), reverse size-consecutiveness (S ^{− 1}), extremalness (E), reverse extremalness (E ^{− 1}), monopolisticness (M _p), bipolar (BP), and polarized-extremalness (PE).

Let S and S′ be two disjoint subsets of $\mathcal{N}$. Then S is said to penetrateS′, written $S \rightarrow S'$, if there exists a, c in S′ and b in S such that a < b < c. Note that if (S ∪S′) → S′ then either S → S′ or $S' \rightarrow S''$, but if S → (S′ ∪S′), it is possible that neither $S \rightarrow S'$ nor $S \rightarrow S''$. In addition, the convention that each part penetrates itself will be adopted.

For a partition $\pi =\{\pi _{1},\ldots,\pi _{p}\}$ of a subset $\mathcal{N}'$ of $\mathcal{N}$, define the digraph associated with π, denoted G(π), as the digraph with p vertices representing the parts of π and edges (i, j) for $i,j \in \{ 1,\ldots,p\}$ if π _i penetrates π _j. A partition π is called noncrossing (NC) if π _i → π _j implies $\pi_{j} \not\rightarrow \pi _{i}$. Noncrossing partitions were first studied in Kreweras [48].

There are three special cases of noncrossingness that are of particular interest. The first concerns the consecutive partitions, introduced and studied in Sect. 3. A partition is called consecutive if and only if the penetration relation among its parts is an empty partial order, that is, no part penetrates any other part. The second instance of NC is nestedness (N); a p-partition π is called nested if the penetration relation among its parts is linear. Finally, a partition π is called order-nonpenetrating (O) if its parts can be indexed (ordered) such that for $i = 2,\ldots,p$, $\pi _{i}\not\rightarrow \cup _{j=1}^{i-1}\pi _{j}$. Note that order-nonpenetrating was called order-consecutive by Chakravarty et al. [11]. Examples of partitions that satisfy the properties considered so far are next illustrated.

Example 5

Let $\mathcal{N} = \mathcal{N}' =\{ 1,\ldots,5\}$, n = 5 and p = 3. Then:

(a)
{{2}, {4}, {1, 3, 5}} is noncrossing but, is not order-nonpenetrating.
(b)
{{2}, {1, 3}, {4, 5}} is order-nonpenetrating, but neither nested nor consecutive.
(c)
{{1}, {2, 3}, {4, 5}} is consecutive.
(d)
{{3}, {2, 4}, {1, 5}} is nested.

A noncrossing partition is called nearly nested (N _e N) if it is nested except for some parts of size 1.

Example 6

Let $\mathcal{N} =\{ 1,2,3,4,5,6,7\}$ and p = 4. Then the partition ({1, 4, 7}, {2, 3}, {5}, {6}) is nearly nested but not order-nonpenetrating.

Table 5 summarizes partition properties mentioned so far.

The following result records implications among the above properties of partitions. The relationship between C, N, NC, and O was determined in Hwang et al. [42].

Theorem 17

The implications among the properties C, N, NC, O, N _e N, S, E, PE, M _p , and BP of partitions are represented precisely by the partial order illustrated in Fig. 3 .

Given a (single-parameter) partition problem, let n ₊, n ₋, and n ₀ denote the number of θ’s which are positive, negative, and equal 0, respectively. Following Hwang and Mallows [35] and [39], Table 6 provides the exact numbers of (single-)size partitions for the various properties listed in Table 5. Note that all the numbers are polynomial in n.

Table 5 Properties of partitions

Full size table

The focus will be on local sorting which changes only a prescribed number of parts. Local sorting requires that attention be given to subsets of the set of parts of a given partition which are then considered themselves as (sub)partitions. Let $\mathcal{N} =\{ 1,\ldots,n\}$ and $\mathcal{E} =\{ 1,\ldots,p\}$. To facilitate reference to the indices of these (sub)partitions, it will be useful to consider a framework where a labeled partition of a set $\mathcal{N}'\subseteq \mathcal{N}$ over an index set ℰ′ ⊆ ℰ is a set $\pi =\{ (j,\pi _{j}) : j \in \mathcal{E}'\}$ where the π _j’s are disjoint subsets of $\mathcal{N}'$ whose union is $\mathcal{N}'$. In particular, $\mathcal{E}'$, $\vert \mathcal{E}'\vert $, and $\{(j,\vert \pi _{j}\vert ) : j \in \mathcal{E}'\}$ will be referred to as, respectively, the support, size, and shape of π. The notation $\pi = (\pi _{1},\ldots,\pi _{p})$ for “(labeled) partitions,” used so far, captures the case where $\mathcal{N}' = \mathcal{N}$ and $\mathcal{E}' = \mathcal{E}$; its use will be continued in those cases. Given a labeled partition of $\mathcal{N}'$ over $\mathcal{E}'$, say $\pi =\{ (j,\pi _{j}) : j \in \mathcal{E}'\}$, and a subset J of $\mathcal{E}'$, $\pi _{J} \equiv \{ (j,\pi _{j}) : j \in J\}$ will be called the subpartition of πcorresponding toJ; in particular, π _J is a partition of $\cup _{j\in J}\pi _{j}$ over J (having support J, size | J | and shape $\{(j,\vert \pi _{j}\vert ) : j \in J\}$). The reference to the partitioned set and/or the support will be sometimes suppressed, referring to partition, subpartitions, etc. Also, a k-partitions/subpartition is a partition/subpartiton of size k. Generically, “local” will refer to subpartitions of limited size.

Table 6 The number of size partitions satisfying NC, C, N, N _e N, S, S ^{− 1}, E, E ^{− 1}, PE, O, M _p, and BP

Full size table

Some definitions about properties of partitions that connect global satisfiability and local satisfiability are next introduced. Suppose Q is a property of partitions. Property Q is hereditary if every subpartition of a partition that satisfies Q must also satisfy Q. Also, for a positive integer k, property Q is k-consistent if for each p ≥ k, a p-partition satisfies Q whenever all of its k-subpartitions satisfy Q. The minimum consistency index of a property of partitions is defined as the minimal positive integer k for which the property is k-consistent provided such an integer exists and as ∞ if Q is not k-consistent for any positive integer k.

The next lemma records an order for k-consistency.

Lemma 6

Let k be a positive integer and let Q be a property of partitions such that a partition π satisfies Q if and only if every k-subpartition of π satisfies Q. Then Q is k′-consistent for all k′ > k.

Attention is next turned to transformations of partitions, where the image of a partition π consists of partitions obtained from π by sorting elements of subpartitions of π that do not satisfy Q to subpartitions that satisfy Q, while preserving the remaining parts of the partition.

For a partition π, let ρ(π) denote its support and ν(π) denote its shape. Formally, three partition-transformations are defined for a given property Q of partitions and a given positive integer k. The transformations are T ^{Q, k, open}, T ^{Q, k, support}, and T ^{Q, k, shape}, where for J ⊆ ℰ and partition π, $T_{J}^{Q,k,t}(\pi ) \equiv [{T}^{Q,k,t}]_{J}(\pi )$ is given by

$$T_{J}^{Q,k,\mathrm{open}}(\pi ) = \left \{\begin{array}{llll} &\{\pi ' :\pi _{\rho (\pi )\setminus J}\mbox{ =}\pi '_{\rho (\pi )\setminus J},\, \\ & \cup _{j\in \rho (\pi )}\pi _{j}\mbox{ =} \cup _{j\in \rho (\pi ')}\pi '_{j}, \\ &\mbox{ and }\pi '_{\rho (\pi ')\setminus [\rho (\pi )\setminus J]} \in Q\} &\mbox{ if }\pi _{J}\not\in Q\ \text{ and }\ \vert J\vert \mbox{ =}k \leq \vert \pi \vert \\ &\{\pi ' :\pi ' \in Q\} &\mbox{ if }\pi _{J}\not\in Q\ \text{ and }\ \vert J\vert \mbox{ =}k>\vert \pi \vert \\ &\varnothing &\mbox{ otherwise}, \end{array} \right.$$

$$T_{J}^{Q,k,\mathrm{support}} = \left \{\pi ' \in T_{ J}^{Q,k,\mathrm{open}}(\pi ) :\rho (\pi ') =\rho (\pi )\right \}$$

and

$$T_{J}^{Q,k,\mathrm{shape}}(\pi ) = \left \{\pi ' \in T_{ J}^{Q,k,\mathrm{open}}(\pi ) :\nu (\pi ') =\nu (\pi )\right \}.$$

For t ∈ {open, support, shape}, partitions in T ^{Q, k, t}(π) are called (Q, k, t)-sortings ofπ.

Recall that $\mathcal{N} =\{ 1,\ldots,n\}$ and $\mathcal{E} =\{ 1,\ldots,p\}$ are assumed given. The natural universal domains for T ^{Q, k, open}, T ^{Q, k, support}, and T ^{Q, k, shape} are partitions without restriction, those with a prescribed support and those with prescribed shape, respectively (where no restriction is to be interpreted as p = n and the support of the partitions is restricted to subsets of $\mathcal{E} =\{ 1,\ldots,n\}$). To index these potential domains, define ${X}^{\mathrm{open}} \equiv \{\mbox{ open}\},{X}^{\mathrm{support}} \equiv \{\rho : \varnothing \neq \rho \subseteq \mathcal{E}\}$, and ${X}^{\mathrm{shape}} \equiv \{\nu =\{ (j,n_{j}) : j \in \rho$}: ρ ⊆ ℰ; each n _j is an integer and $\sum _{j\in \rho }n_{j} = \vert \mathcal{N}\vert \} :\}$; in particular, X ^support and X ^shape represent, respectively, all potential supports and shapes of subpartitions over ℰ. If $\widehat{\Pi }$ is the set of all partitions, then it will be denoted $\widehat{{\Pi }}^{\text{open}}$ (the support of partitions in $\widehat{{\Pi }}^{\text{open}}$ is in the power set of $\{1,\ldots,n\}$). Further, the set of all partitions with a prescribed support ρ will be denoted $\widehat{{\Pi }}^{\rho }$, the set of all partitions with a prescribed shape ν will be denoted $\widehat{{\Pi }}^{\nu }$, and finally, for positive integer p, write $\widehat{{\Pi }}^{p}$ for $\widehat{{\Pi }}^{\{1,\ldots,p\}}$. If $\widehat{\Pi }$ is, respectively, $\widehat{{\Pi }}^{\mathrm{open}}$, $\widehat{{\Pi }}^{\rho }$ for a particular ρ, or $\widehat{{\Pi }}^{\nu }$ for a particular ν, a corresponding transformation on $\widehat{\Pi }$ is called open, support-preserving, or shape-preserving, respectively. The potential domains of the transformations T ^{Q, k, t} for given Q and t ∈ {open, support, shape} are then the sets $\widehat{{\Pi }}^{x}$ with x ∈ X ^t (but degenerate instances are sometimes excluded).

For t ∈ {open, support, shape} and x ∈ X ^t, a property Q of partitions is t-regular forx, if for every $\mathcal{N}'\subseteq \mathcal{N}$ with ${\Pi }^{x}(\mathcal{N}')\neq \varnothing $, there is a partition in ${\Pi }^{x}(\mathcal{N}') \cap Q$ (with the intuitive interpretation of ${\Pi }^{x}(\mathcal{N}')$); Q is t-regular if it is t-regular for every x ∈ X ^t.

Lemma 7

Let Q be a t-regular property of partitions. Then for every x ∈ X ^t, $\pi \in \widehat{ {\Pi }}^{x}$ and $J \subseteq \mathcal{E}$ with |J| = k,

$$[\pi _{J} \in Q] \Leftrightarrow [T_{J}^{Q,k,t}(\pi ) = \varnothing ];$$

(20)

further, if Q is also k-consistent and hereditary, then

$$[\pi \in Q] \Leftrightarrow [{T}^{Q,k,t}(\pi ) = \varnothing ].$$

(21)

Let k be a positive integer, let t ∈ {open, support, shape}, let Q be a property of partitions, and let Π ^∗ be a set of partitions. Then:

(i)
Π ^∗ is Q-(strongly, k, t)-invariant if for each π ∈ Π ^∗ ∖ Q: for every subset J of ℰ having k elements and with $\pi _{J}\notin Q$, every π′ in $(T_{J}^{Q,k,t})(\pi )$ satisfies $\pi ' \in {\Pi }^{{\ast}}$.
(ii)
Π ^∗ is Q-(sort-specific, k, t)-invariant if for each π ∈ Π ^∗ ∖ Q: for every subset J of ℰ having k elements and with $\pi _{J}\notin Q$, some π′ in $(T_{J}^{Q,k,t})(\pi )$ satisfies $\pi ' \in {\Pi }^{{\ast}}$.
(iii)
Π ^∗ is Q-(part-specific, k, t)-invariant if for each π ∈ Π ^∗ ∖ Q: for some subset J of $\mathcal{E}$ having k elements and with $\pi _{J}\notin Q$, every π′ in (T _J ^{Q, k, t})(π) satisfies $\pi ' \in {\Pi }^{{\ast}}$.
(iv)
Π ^∗ is Q-(weakly, k, t)-invariant if for each π ∈ Π ^∗ ∖ Q: for some subset J of $\mathcal{E}$ having k elements and with $\pi _{J}\notin Q$, some π′ in $(T_{J}^{Q,k,t})(\pi )$ satisfies $\pi ' \in {\Pi }^{{\ast}}$.

When referring to Q-(ℓ, k, t)-invariance, the values of ℓ, k, and t are called level, degree, and type of the corresponding invariance. The level ℓ can take four possible values, forming two pairs – {strongly, weakly} and {part-specific, sort-specific}; if ℓ is a specific value, then and ℓ ^{− 1} is defined as the other value in the same pair. Also, type can take one of three possible values – “open,” “support,” and “shape” – and the possible values of degree are $k = 1,2,\ldots$. When a variable is not quantified in forthcoming statements/results about (ℓ, k, t), it means that the statement is valid for all choices of that variable.

A multivalued function on a set of partitions will be called a statistic. To compare values of a statistics, “ < ” will stand for “ ≤ and ≠.” Let k be a positive integer, let t ∈ {open, support, shape}, and let Q be a t-regular property of partitions. Then:

(i)
Q is (strongly, k, t)-sortable if for every x ∈ X ^t, there is a statistic s(. ) on $\widehat{{\Pi }}^{x}$ such that for every $\pi \in {\Pi }^{x} \setminus Q$ for every J ⊆ ℰ having k elements and $\pi _{J}\notin Q$: every $\pi ' \in (T_{J}^{Q,k,t})(\pi )$ satisfies s(π′) < s(π).
(ii)
Q is (sort-specific, k, t)-sortable if for every x ∈ X ^t, there is a statistic s(. ) on $\widehat{{\Pi }}^{x}$ such that for every $\pi \in {\Pi }^{x} \setminus Q$ and for every J ⊆ ℰ having k elements and $\pi _{J}\notin Q$: some $\pi ' \in (T_{J}^{Q,k,t})(\pi )$ satisfies s(π′) < s(π).
(iii)
Q is (part-specific, k, t)-sortable if for every x ∈ X ^t, there is a statistic s(. ) on $\widehat{{\Pi }}^{x}$ such that for every $\pi \in {\Pi }^{x} \setminus Q$, for some J ⊆ ℰ having k elements and $\pi _{J}\notin Q$: every $\pi ' \in (T_{J}^{Q,k,t})(\pi )$ satisfies s(π′) < s(π).
(iv)
Q is (part-specific, k, t)-sortable if for every x ∈ X ^t, there is a statistic s(. ) on $\widehat{{\Pi }}^{x}$ such that for every $\pi \in {\Pi }^{x} \setminus Q$, for some J ⊆ ℰ having k elements and $\pi _{J}\notin Q$: some $\pi ' \in (T_{J}^{Q,k,t})(\pi )$ satisfies s(π′) < s(π).

As for invariance, the variables ℓ, k, and t of (ℓ, k, t)-sortability are referred to as level, degree, and type. A comparison of the above definitions of sortability and historic ones is deferred till after Theorem 19.

Theorem 18

Let Q be a k-consistent and hereditary property of partitions, let x ∈ X ^t , and let ${\Pi }^{{\ast}}\subseteq \widehat{ {\Pi }}^{x}$ . If Π ^∗ is Q-(ℓ,k,t)-invariant and Q is (ℓ ⁻¹ ,k,t)-sortable on $\widehat{{\Pi }}^{x}$ , then for every partition π ∈ Π^∗, there is a finite sequence π ⁰ = π, ${\pi }^{1},\ldots {,\pi }^{s}$ of partitions in Π^∗ with π ^s satisfying Q (s = 0 when π ∈ Q). In particular, if Π^∗ is nonempty, then Π^∗∩ Q is nonempty.

Suppose one wishes to prove that a property Q of partitions is not T-sortable. Using the definition of (ℓ, k, t)-sortability, it is then necessary to demonstrate that there is no statistic s having the property that all corresponding (Q, k, t)-sorting of a partition π in $\widehat{\Pi }$ which does not satisfy Q have a lower s-value than π. It is impossible to consider all potential statistics. But, the next corollary of Theorem 18 and the following Theorem 19 provide one with tools to establish non-sortability.

Corollary 3

Let Q be a k-consistent and hereditary property of partitions and let x ∈ X ^t . If some ${\Pi }^{{\ast}}\subseteq \widehat{ {\Pi }}^{x}$ is Q-(ℓ,k,t)-invariant and does not satisfy Q, then Q is not (ℓ ⁻¹ ,k,t)-sortable on $\widehat{{\Pi }}^{x}$

Theorem 19

Let Q be a t-regular, k-consistent, and hereditary property of partitions. Then the following are equivalent:

(a) Q is (strongly, k,t)-sortable.
(b) For every x ∈ X ^t , every set of partitions Π $\Pi^* \subseteq \widehat{\Pi}^x$ which is (weakly, k,t)-invariant satisfies Q.
(c) There exist no x ∈ X ^t and finite sequence ${\pi }^{0}{,\pi }^{1},\ldots {,\pi }^{q-1}{,\pi }^{q} {=\pi }^{0}$ of partitions in $\widehat{{\Pi }}^{x}$ with ${\pi }^{j} \in ({T}^{Q,k,t}){(\pi }^{j-1}) \setminus Q$ for $j = 1,\ldots,q$.
(d) For every x ∈ X ^t , Γ(Q,t,x) is acyclic.

The historic evolution of sortability is next reviewed. The notion of sortability (as well as consistency and invariance, the latter under a different name) was first proposed by Hwang et al. [42]. However, this reference focused on sortability parameterized by (sort-specific, k, support). Chang et al. [15] extended the definition to the full (ℓ, k, t) range and gave sortability results in the full range to most properties listed in Table 5. Both papers defined sortability by the conclusion of Theorem 18, namely, a property Q is called (ℓ, k, t)-sortable if for every Q-(ℓ ^{− 1}, k, t)-invariant family Π contains a partition satisfying Q. While sortability results for particular properties relied on the construction of a corresponding decreasing statistic, the existence of a statistic was not part of the definition of sortability. Theorem 18 demonstrates that the definition used herein is more demanding than the historic definition. Still, Theorem 19 establishes an equivalence between the two definitions when ℓ = strongly (but, for other values of ℓ, “historic sortability” does not imply the “current sortability” as the implication (b) ⇒ (a) of Theorem 19 needs not hold. It is noted that the current definition of sortability explicitly requires consistency, whereas the historic definition does not; consequently, a result asserting that any (ℓ, k, t)-sortability implies k-consistency (e.g., Hwang et al. [42], Chang et al. [15]) does not appear in this chapter.

The next lemma summarizes some implications of (ℓ, k, t)-sortability of partition properties.

Lemma 8

Let Q be a k-consistent, hereditary property of partitions. The various categories of sortability of Q satisfy the following implications:

(a) (strongly, k,t) implies both (sort-specific, k,t) and (part-specific, k,t), and (sort-specific, k,t) and (part-specific, k,t) imply, each by itself, (weakly, k,t).
(b) With ℓ ∈{strongly, part-specific}, if Q is support-regular, then (ℓ,k, open) implies (ℓ,k, support), and if Q is shape-regular, then (ℓ,k, support) implies (ℓ,k, shape).
(c) With ℓ ∈{sort-specific, weakly}, (ℓ,k, shape) implies (ℓ,k, support), and (ℓ,k, support) implies (ℓ,k, open).

The table below summarizes the implications among the various categories of sortability established in Lemma 8 (using obvious abbreviations). The table should be viewed as a 2 ×6 matrix where each cell specifies a different pair (ℓ, t), for example, cell (1, 2), corresponds to ℓ = strongly and t = support. Implications represented by “ ⇒ ^∗” require the corresponding regularity of Q, see part (b) of Lemma 8 (Table 7).

Table 7 Sortability-implications

Full size table

The (ℓ, k, t)-sortability of each property Q discussed in this section has been explicitly solved in the literature (see Hwang et al. [42], Chang et al. [15], and Hwang and Rothblum [39]).

It is next shown how the notions of consistency and sortability can be used to facilitate the search of an optimal partition in a family Π under a given objective function F. Let $\Pi _{F}^{{\ast}}\subseteq \Pi $ denote the set of partitions that maximize F over Π. Recall that there are two general types of results:

Type 1.
Every optimal partition in Π satisfies a property Q, that is, Π _F ^∗ ⊆ Q.
Type 2.
There exists an optimal partition in Π satisfying a property Q, that is, Π _F ^∗ ∩ Q is not empty.

Tools for establishing type 2 results are sortability arguments that show that starting from a partition in Π _F ^∗, local Q-sortings which preserve optimality will result in an optimal partition satisfying Q. Type 1 results are typically easier to prove, if true, by showing that the objective function of a partition not satisfying Q can be strictly improved. This last idea is next cast formally.

Theorem 20

Suppose Q is k-consistent and Π is a t-family for t ∈{shape, support}. If every π ∈ Π not satisfying Q has a k-subpartition which can be (Q,k,t)-sorted with F increasing, then every optimal partition in Π satisfies Q.

Consider objective functions of the form

$$F(\pi ) = f\left (g_{1}[h(\theta _{\pi _{1}})],\ldots,g_{p}[h(\theta _{\pi _{p}})]\right )\,,$$

(22)

where $f : {\mathbb{R}}^{p} \rightarrow \mathbb{R}$ is (monotone), convex, or Schur convex, the g _j’s are real-valued functions over the reals, and h is a real-valued function over finite subsets of the reals. Types 1 and 2 results will be established for such objective functions. In stating type 1 results, “strict conditions” will be imposed on f, g, and h that are strict (e.g., “nondecreasing” becomes “increasing” and “nonnegative” becomes “positive”) along with the requirement that the θ ⁱ’s are distinct. Also, Lemma 1 shows that for verifying the optimality of partitions that satisfy a particular property, it suffices to consider the “single” versions. Instances of (22) with h as the sum-function and g _j’s as the identity, that is, F expressed by $F(\pi ) = f(\sum \theta _{\pi _{1}},\ldots,\sum \theta _{\pi _{p}})$, were studied in Sect. 3. Results which are in a more general form are next recorded.

Theorem 21

Suppose F(⋅) is given by ( 22 ) where f is convex and one of the following holds:

(a) f is nondecreasing, and each g _j is convex.
(b) f is nonincreasing, and each g _j is concave.
(c) Each g _j is linear (and no monotonicity assumption imposed on f).

Then every shape-problem has a consecutive optimal partition. Further, if θ ≥ 0 (θ ≤ 0), then every shape problem has a size-consecutive (reverse size-consecutive) optimal partition, every size-problem has an extremal (reverse extremal) optimal partition and every relaxed-size problem has a monopolistic optimal partition. Finally, under strict conditions that exclude f from being strictly convex under (a) and (b), the corresponding type 1 conclusions hold.

The case of single-shape-problems with $g_{j}\left (\sum \theta _{\pi _{j}}\right ) = g(\theta _{\pi _{j}},n_{j})$ where $(n_{1},\ldots,n_{p})$ is the prescribed shape of π was first studied by Chakravarty et al. [11].

Recall the definition of majorization in Sect. 2 and the terminology used in that section. The notion of majorization is next extended to (two types of) weak majorization by dropping the requirement $\sum _{i=1}^{p}a_{[i]} =\sum _{ i=1}^{p}b_{[i]}$ (condition (8)). Specifically, a p-vector a is said to weakly submajorize a p-vector b, written a _w ≻ b if

$$\sum _{i=1}^{k}a_{ [i]} \geq \sum _{i=1}^{k}b_{ [i]}\ \ \text{ for }\ \ k = 1,\ldots,p;$$

(23)

a is said to weakly supermajorizeb, written ${a\,}^{w}\! \succ b$ if

$$\sum _{i=k}^{p}a_{ [i]} \leq \sum _{i=k}^{p}b_{ [i]}\ \ \text{ for }\ \ k = 1,\ldots,p.$$

(24)

Strict weak sub-/super-majorization will refer to situations where at least one of the corresponding inequalities is strict. The following lemma is standard (e.g., Marshall and Olkin [50]).

Lemma 9

Suppose $f : {\mathbb{R}}^{p} \rightarrow \mathbb{R}$ is Schur convex and nondecreasing (nonincreasing) and a and b are vectors in ℝ^p with a weakly submajorizing (supermajorizings) b. Then f(a) ≥ f(b). Further, if the weak submajorization (supermajorizings) is strict and the function f is strictly Schur convex and increasing (decreasing), then f(a) > f(b).

As majorization and weak sub/super-majorization are invariant over permutations of vectors, these relations can be applied to multisets of real numbers (with the interpretation that the relation holds for any representing vectors).

Lemma 10

Let f be Schur convex and nondecreasing (nonincreasing) and Q be (ℓ,k,t)-sortable with ℓ ∈{strongly, sort-specific, part-specific,weak}, $k = 2,3,\ldots$ and t ∈{shape, size}. Suppose for any partition π not satisfying Q, there always exists an (ℓ⁻¹,k,t)-Q-sorting from π to π′ such that, with K as the indices of the sorted parts, $\{\{g_{j}[h(\theta _{\pi _{j}'})] : j \in K\}\}\,_{w}\! \succ {(\,}^{w}\! \succ )\{\{g_{j}[h(\theta _{\pi _{j}})] : j \in K\}\}$. Then every t-problem has an optimal partition satisfying Q. Further, under strict conditions, every optimal partition satisfies Q. Finally, the above conclusions hold if f is just Schur convex and the weak majorization requirement is replaced by majorization.

Objective functions that are given by (22) with f Schur convex, g _j independent of j and h as the sum-function, are next addressed. The result modifies Theorem 21 which considered problems with f convex and g _j’s allowed to depend on j.

Theorem 22

Suppose

$$F(\pi ) = f[g(\theta _{\pi _{1}}),\ldots,g(\theta _{\pi _{p}})],$$

(25)

where f is Schur convex and either of the following holds:

(a) f is nondecreasing, and g is convex and nondecreasing (nonincreasing).
(b) f is nonincreasing, and g is concave and nondecreasing (nonincreasing).
(c) g is linear and nondecreasing (nonincreasing) (and no monotonicity assumption imposed on f).

Then every shape-problem has a consecutive optimal partition. Further, if in addition θ ≥ 0 (θ ≤ 0), then every shape-problem has a size-consecutive (reverse size-consecutive) optimal partition, every size-problem has an extremal (reverse extremal) optimal partition, and every relaxed-size problem has a monopolistic optimal partition. Finally, under strict conditions, the corresponding type 1 conclusions hold.

The objective function of (25) with g as the identity and f Schur convex, but not necessarily monotone, was studied in Sect. 3. In fact, the conclusions of Theorem 22 for those cases were established in Theorem 11 for shape-problems and in Theorem 14 for size-problems (the latter with the Schur convexity of f relaxed to asymmetric Schur convexity).

In a clustering problem, one wishes to partition the points into clusters such that under some distance measure, points in the same cluster are close to each other compared to points in different clusters. It is natural to formulate the clustering problem into a minimization problem. To preserve this spirit, consider through the end of this section partition problems where F(π) is to be minimized. The notion of “strict conditions,” will continue to refer to strict properties of functions appearing in the objective function, for parameters (the forthcoming w _j’s) to be distinct and nonzero and, in addition, for θ ⁱ’s being distinct.

A general distance function (gdf) is a continuous symmetric function D : ℝ² → ℝ such that for x′ ≥ x ≥ y ≥ y′, D(x′, y) ≥ D(x, y) and D(x, y′) ≥ D(x, y) (the continuity requirement can be relaxed for much of the forthcoming results, but technical details are required). A gdf is strict if nondecreasing and nonincreasing are replaced by increasing and decreasing, respectively. A gdf is normalized if D(x, x) is independent of x.

Boros and Hammer [6] considered the strict and normalized gdf D(x, y) = | x − y | and established the following theorem.

Theorem 23

Suppose

$$F(\pi ) =\sum _{ j=1}^{p}\sum _{ i,u\in \pi _{j}}{\vert \theta }^{i} {-\theta }^{u}\vert \,.$$

(26)

Then every size problem has a noncrossing optimal partition. Further, under strict conditions, the corresponding type 1 conclusions hold.

It is not known whether or not the conclusions of Theorem 23 extend to shape-problems, except that the case of uniform shape was solved by Pfersky et al. [60].

Theorem 24

Suppose p divides n and consider the single shape-problem corresponding to $(n/p,\ldots,n/p)$ with F defined by ( 26 ). Then there exists a consecutive optimal solution. Further, under strict conditions, a corresponding type 1 conclusion holds.

Assume the θ ⁱ’s are given. For a given gdf D and a set $S \subseteq \mathcal{N}$, c is called a centroid of S with respect to D, or briefly the D-centroid of S, if $c \in \ \text{argmin}_{x}\sum _{i\in S}D(\theta _{i},x)$; continuity assures that every finite set has a centroid. When D(x, y) = d( | x − y | ) where d( ⋅) is a continuous, nondecreasing function on the nonnegative reals with d(0) = 0, a d-centroid will refer to the corresponding D-centroid. If d is the identity, then the d-centroid is the median. Chang and Hwang [14] proved the type 1 results of the following theorem.

Theorem 25

Suppose

$$F(\pi ) =\sum _{ j=1}^{p}w_{ j}\sum _{i\in \pi _{j}}d({\vert \theta }^{i} - c_{ j}\vert )\,,$$

(27)

where d(⋅) is a nondecreasing function on the nonnegative reals with lg d concave and d(0) = 0, and for each j, w _j > 0 and c _j is a d-centroid of $\theta _{\pi _{j}}$ . Then every size problem has a noncrossing optimal partitions. Further, under strict conditions the corresponding type 1 conclusions hold.

Boros and Hwang [7] considered shape problems with the objective function given by (27) where d is the identity function (and c _j is a medium). They proved the following result.

Theorem 26

Suppose

$$F(\pi ) =\sum _{ j=1}^{p}w_{ j}\sum _{i\in \pi _{j}}{\vert \theta }^{i} - m_{ j}\vert \,,$$

(28)

where for each j, w _j > 0 and m _j is the median of π _j. Then every shape problem has a noncrossing optimal partition. Further, under strict conditions, the corresponding type 1 conclusions hold.

For uniform weights, the consecutive property can be preserved under more general distance functions.

Theorem 27

Suppose

$$F(\pi ) =\sum _{ j=1}^{p}\sum _{ i\in \pi _{j}}D_{i}{(\theta }^{i},c_{ j})\,$$

(29)

where for each i, D _i is a gdf and for each j, c _j is a centroid of π _j with respect to $\{D_{i} : i \in \pi _{j}\}$ (in the sense that $c_{j} \in \min \arg _{x}\sum _{i\in \pi _{j}}D_{i}{(\theta }^{i},x)$). Then every size problem has a consecutive optimal partition. Further, under strict conditions, the corresponding type 1 conclusions hold.

Theorem 27 for $D_{i}{(\theta }^{i},c_{j}) = w_{i}d{{(\theta }^{i} - c_{j})}^{2}$ where w _i is a weight on element was proved by Fisher [24]. Hwang [34] extended the Fisher result to distance functions $D_{i}{(\theta }^{i},c_{j}) = w_{i}D{(\theta }^{i},c_{j})$.

To obtain a shape-version of Theorem 27, uniform D _i’s that satisfy some further structure have to be assumed. Specifically, a gdf D is submodular if for every x, x′, y, y′ ∈ ℝ with x ≤ x′ and y ≤ y′, $D(x',y') + D(x,y) \leq D(x',y) + D(x,y')$. The following result was proved by Hwang et al. [41].

Theorem 28

Suppose

$$F(\pi ) = f\left (\sum _{i\in \pi _{1}}D{(\theta }^{i},c_{ 1}),\ldots,\sum _{i\in \pi _{p}}D{(\theta }^{i},c_{ p})\right ),$$

(30)

where D is a submodular gdf, f is Schur concave nondecreasing, and for each j, c _j is a D-centroid of π _j. Then every shape problem has a consecutive optimal partition. Further, under strict conditions, the corresponding type 1 conclusions hold.

5 Multiparameter: Polyhedral Approach

Sections Sect. 3 and Sect. 4 examined partition properties that facilitate a reduction of the size of the set of partitions through which one has to search for an optimal partition (for one-dimensional problems). This approach is continued in the current section and the next one but with d ≥ 1 instead of d = 1. The properties that were explored in those sections are mostly geometric in nature, though the geometric characteristics may be in disguise for some of them. For example, the property “consecutive” was defined as “each part consists of consecutive integers,” but it can also be defined as “the convex hulls of the parts are pairwise disjoint.” This section and the next one continue the study of geometric properties of partitions but when the partitioned points are vectors in a high-dimensional space.

A unique feature of ℝ¹ vs. ℝ^d for d > 1 is that it has a natural order. In particular, when ordering the partitioned elements ${\theta }^{1}{,\theta }^{2},\ldots {,\theta }^{n}$ so that ${\theta }^{1} {\leq \theta }^{2} \leq \ldots {\leq \theta }^{n}$, it is possible to express all relevant geometric properties of the partitioned elements in terms of the corresponding partition of the index set $\mathcal{N}$, except hiding the distinction whether two disjoint intervals of θ’s touch at the boundary (see next paragraph on this issue). This conversion allows us to focus on “index-partitions.” However, when the dimension d exceeds one, there is no natural ordering of the points which allows the conversion of geometric properties that depend on the formation of the points to properties of partitioned indices. Consequently, it is more natural to define partitions on the set of vectors and not on the set of indices.

There is still the question why the focus so far was on partitioning index sets in the case of single-parameter partition problems. The reason is that “index-partitions” have an advantage over “vector-partitions” in situations where the columns of A have duplicate vectors. As the indices are always distinct, such situations do not require any attention when partitioning the index set. But, when considering “vector-partitions,” the presence of duplicate points requires one to account for the duplication of each vector in each part, that is, the parts are multisets, rather than sets, and more importantly, the same vector may appear in different parts. Partitioning the index set in the case where d = 1 voided the need of dealing with multisets and allowed us to focus on the main development of the theory. In Sect. 5, the focus is to prove that there exists an extreme point of the partition polytope which corresponds to an optimal partition with certain geometric property. When A ⁱ’s are distinct, then this geometric property does not allow two disjoint parts to touch at boundary; when A ⁱ’s are not distinct, then the geometric property is weakened to allow such touching. But this distinction does not affect the role the partition polytopes play. Therefore, partition of the index set can be continued so as to take advantage of the convenience it provides. However, in Sect. 6, the emphasis is to study these two and other geometric properties in detail, not only in their natures, but also in their differences. In order to understand these fine points, the framework will then revert to partitioning the set A.

Throughout this section, it is assumed that d ≥ 1 and ${A}^{1},\ldots,{A}^{n} \in {\mathbb{R}}^{d}$ are given. In the sum-partition problem, an objective function F(. ) is maximized over a family of p-partitions Π where the objective value of a p-partition $\pi = (\pi _{1},\ldots,\pi _{p})$ is given by F(π) = f(A _π), with $A_{\pi } = \left (\sum _{i\in \pi _{1}}{A}^{i},\ldots,\sum _{i\in \pi _{p}}{A}^{i}\right ) \in {\mathbb{R}}^{d\times p}$ and f(. ) a real-valued function on ℝ^{d ×p} (or a relevant subset thereof). The approach that is followed in the current section is to consider the problem of optimizing (an extension of) f over the partition polytope P _A ^Π defined as the convex hull of $\{A_{\pi } :\pi \in \Pi \}$. When the function f is guaranteed to obtain a maximum over P _A ^Π at a vertex, enumerating the vertices of P _A ^Π and selecting a partition corresponding to a vertex that maximizes f will produce a solution to the partition problem. This approach is enhanced when the vertices are associated with partitions having properties that are present only in a “small number” of partitions.

The case that is considered has the function f(. ) linear. It will be shown how linear programming can be used to solve the corresponding partition problem. Here, the approach is to obtain a representation of the problem as a projection of an optimization problem over a (generalized transportation) polytope which has a simple linear inequality representation. The resulting algorithm is polynomial in the number of partitioned vectors, their dimension, and the number of parts of the partitions. The approach follows Hwang, Onn, and Rothblum [44].

Superscripts will be used to denote columns of matrices, subscripts for rows, and double indices for elements, for example, U _t, U ^j, and U _t ^j. For matrices U and V of common dimension, say d ×p, the inner product of U and V is defined as the scalar $\langle U,V \rangle \equiv \sum _{t=1}^{d}\sum _{j=1}^{p}U_{t}^{j}V _{t}^{j}$. For matrices U, V, and W of dimension d × p, d × q, and q × p, respectively,

$$\langle U,V W\rangle =\langle {V }^{T}U,W\rangle =\langle U{W}^{T},V \rangle \,.$$

(31)

Let I be the n ×n identity matrix. At times, the columns of I (the n standard unit vectors in ℝⁿ) will be the vectors that are associated with the partitioned elements $1,\ldots,n$. In such cases, for each p-partition π, $I_{\pi } \in {\mathbb{R}}^{n\times p}$ is the matrix associated with π, namely,

$$(I_{\pi })_{t}^{j} \equiv \left \{\begin{array}{ll} 1 \mbox{ if }t \in \pi _{j}\ \mbox{ and } \\ 0\mbox{ otherwise.} \end{array} \right.$$

(32)

For a set of partitions Π, $P_{I}^{\Pi } = \mbox{ conv}\{I_{\pi } :\pi \in \Pi \}$ will be referred to as a generalized transportation polytope (associated withΠ) and to the corresponding partition problem as a generalized transportation problem. Classical transportation polytopes are obtained when Π consists of a single shape. The special attention given to this class of partition problems is justified by the following properties that will be established:

(i)
The correspondence π → I _π is a bijection; further, none of the matrices I _π are expressible as a convex combination of others.
(ii)
Explicit representing systems of linear inequalities are available for bounded-shape generalized transportation polytopes.
(iii)
Constrained-shape partition problems are reducible to generalized transportation problems.

These properties are next verified and then used to develop efficient computational methods for solving the (linear) partition problem.

Lemma 11

Let Π be a set of p-partitions. Then the correspondence π → I_π is a bijection of Π onto the vertices of P_I ^Π ; in particular, the vertices of P _I ^Π are precisely the matrices $\{I_{\pi } :\pi \in \Pi \}$ . Further, the inverse correspondence of vertices of P _I ^Π onto the partitions of Π has vertex v corresponding to the partition π with $\pi _{j} =\{ t : v_{t}^{j} = 1\}$ for $j = 1,\ldots,p$.

An important feature of the bijection π → I _π of Lemma 11 is the fact that it is constructive in both directions, namely, it is easy to determine I _π from π and, vice versa, π from I _π.

Linear-inequality representation for bounded-shape generalized transportation polytopes is next provided.

Theorem 29

Let L and U be positive integer p-vectors satisfying L ≤ U and $\sum _{j=1}^{p}L_{j} \leq n \leq \sum _{j=1}^{p}U_{j}$ , and let $\Pi \equiv {\Pi }^{(L,U)}$ . Then P _I ^Π is the solution set of the linear system:

$$\begin{array}{rcl} (a)& & X_{t}^{j} \geq 0\ \mbox{ for }\ t = 1,\ldots,n\ \mbox{ and }\ j = 1,\ldots,p.\qquad \qquad \qquad \ \\ (b)& & \sum _{j=1}^{p}X_{ t}^{j} = 1\ \mbox{ for }\ t = 1,\ldots,n. \\ (c)& & L_{j} \leq \sum _{t=1}^{n}X_{ t}^{j} \leq U_{ j}\ \mbox{ for }\ j = 1,\ldots,p\,. \end{array}$$

(33)

When L = U, Theorem 29 specializes to the following inequality description of classical transportation polytopes.

Corollary 4

Let $n_{1},\ldots,n_{p}$ be positive integers with $\sum _{j=1}^{p}n_{j} = n$ and let $\Pi \equiv {\Pi }^{(n_{1},\ldots,n_{p})}$ . Then P _I ^Π is the solution set of the linear system:

$$\begin{array}{rcl} (a)& & X_{t}^{j} \geq 0\ \mbox{ for }\ t = 1,\ldots,n\ \mbox{ and }\ j = 1,\ldots,p.\qquad \quad \\ (b)& & \sum _{j=1}^{p}X_{ t}^{j} = 1\ \mbox{ for }\ t = 1,\ldots,n. \\ (c)& & \sum _{t=1}^{n}X_{ t}^{j} = n_{ j}\ \mbox{ for }\ j = 1,\ldots,p\,. \end{array}$$

(34)

Theorem 30

Let A ∈ ℝ^d×n and let Π be a set of p-partitions. Then the partition polytope P _A ^Π is the image of the generalized transportation polytope P _I ^Π under the linear function mapping $X \in P_{I}^{\Pi } \subseteq {\mathbb{R}}^{n\times p}$ into AX ∈ ℝ^d×p . In particular, for every p-partition π, A_π = AI_π.

The representation of partition polytopes as a projection of generalized transportation polytopes derived in Theorem 30 yields the following test for vertices of constrained-shape partition polytopes:

Corollary 5

Let A ∈ ℝ^d×n , let Π be a set of p-partitions, and let V ∈ P _A ^Π . Then V is a vertex of P _A ^Π if and only if every representation $V = \frac{1} {2}A(X' + X'')$ with $X',X'' \in P_{I}^{\Pi }$ has AX′ = AX′. □

Theorem 30 yields the following result that shows that partition problems over Π may be lifted to optimization problems over generalized transportation polytopes.

Corollary 6

Let A ∈ ℝ^d×n , Π be a set of p-partitions, f : ℝ^d×p → ℝ, and F : Π → ℝ with F( π) = f(A_π) for each π ∈ Π. Consider the function h : P_I ^Π → ℝ with h(X) = f(AX) for each X ∈ P_I ^Π. If π is a p-partition such that I _π maximizes h over $\{I_{\pi '} :\pi ' \in \Pi \}$, then π maximizes F(⋅) over Π; further, if f is linear with f(X) = 〈C,X〉 for every $X \in {\mathbb{R}}^{k\times p}$ , then

$$h(Z) =\langle C,AZ\rangle =\langle {A}^{T}C,Z\rangle \quad \mbox{ for all }\ Z \in P_{ I}^{\Pi },$$

(35)

and a partition π with I_π maximizing h over $P_{I}^{\Pi } \supseteq \{ I_{\pi '} :\pi ' \in \Pi \}$ exists.

Let A ∈ ℝ^{d ×n}, C ∈ ℝ^{d ×p} and let L and U be positive integer vectors satisfying L ≤ U and $\sum _{j=1}^{p}L_{j} \leq n \leq \sum _{j=1}^{p}U_{j}$, and $F : {\Pi }^{(L,U)} \rightarrow \mathbb{R}$ with $F(\pi ) =\langle C,A_{\pi }\rangle$ for each π ∈ Π ^{(L, U)}. The goal is to maximize F(. ) over $\Pi \equiv {\Pi }^{(L,U)}$. Let $h : P_{I}^{\Pi } \rightarrow \mathbb{R}$ with $h(X) =\langle C,AX\rangle =\langle {A}^{T}C,X\rangle$ for each X ∈ P _I ^Π. In view of Corollary 6 and Lemma 11, the problem of maximizing F over Π ^{(L, U)} reduces to finding a vertex of P _I ^Π that maximizes the linear objective with coefficients $\{({A}^{T}C)_{t}^{j} : t = 1,\ldots,n$ and $j = 1,\ldots,p\}$ over P _I ^Π; with $a \equiv \max \{\lg \vert A_{t}^{i}\vert : A_{t}^{i}\neq 0\}$ and $c \equiv \max \{\lg \vert C_{i}^{j}\vert : C_{i}^{j}\neq 0\}$, these coefficients are bounded by ke ^{a + c} and are computable in time O[npd(a + c)] (the availability of sophisticated fast algorithms for multiplying matrices is ignored); using the explicit representation of P _I ^Π provided in Theorem 29, the problem reduces to a structured linear program, and standard results show that an optimal vertex for P _I ^Π can be identified in strongly polynomial time O[(n + p)np + (n + p)² u] where $u \equiv \max _{i}\lg (U_{i} - L_{i}) \leq n$ (see Ahuja and Orlin [1] or Ahuja, Orlin, and Tarjan [2]).

The above solution method applies (in fact, in a simplified form) to single-shape partition problems (with linear objective) and, consequently, to constrained-shape problems where the set of shapes is tractable. Thus, for a set of shapes Π whose size is polynomial in the parameters k, n, and p, a polynomial solution method by solving | Π | linear programs (each having L = U is available).

It is next shown that, with linear objective and special structure, a solution to single-shape partition problems can be obtained explicitly without addressing the linear programming problem over P _I ^Π.

Theorem 31

Let A ∈ ℝ^d×n , C ∈ ℝ^d×p, $n_{1},\ldots,n_{p}$ be positive integers with $\sum _{j=1}^{p}n_{j} = n$ , and $F : {\Pi }^{(n_{1},\ldots,n_{p})} \rightarrow \mathbb{R}$ with $F(\pi ) =\langle C,A_{\pi }\rangle$ for each $\pi \in {\Pi }^{(n_{1},\ldots,n_{p})}$ . If A and C satisfy

$${({A}^{t} - {A}^{t+1})}^{T}({C}^{j} - {C}^{j+1}) \geq 0\ \mbox{ for }\ 1 \leq t < n\ \mbox{ and }\ 1 \leq j < p\,,$$

(36)

then the p-partition π with $\pi _{j} = \left \{\sum _{u=1}^{j-1}n_{u} + 1,\ldots,\sum _{u=1}^{j}n_{u}\right \}$ for $j = 1,\ldots,p$ maximizes F(.) over ${\Pi }^{(n_{1},\ldots,n_{p})}$; further, if the inequalities of (36) hold strictly, then π is a unique maximizer.

Standard arguments show that condition (36) is equivalent to (the seemingly stronger assertion):

$${ \left ({A}^{t_{1} } - {A}^{t_{2} }\right )}^{T}\left ({C}^{j_{1} } - {C}^{j_{2} }\right ) \geq 0\ \mbox{ for }\ 1 \leq t_{1} < t_{2} \leq n\ \mbox{ and }\ 1 \leq j_{1} < j_{2} \leq p\,.$$

(37)

Theorem 31 facilitates the identification of optimal partitions when specified permutations of the columns of A and C satisfy (36). This is always possible when d = 1 by sorting the two sets of scalars ${A}^{1},\ldots,{A}^{n}$ and ${C}^{1},\ldots,{C}^{p}$ to increasing sequences, thereby yielding an explicit solution to the partition problem.

Two properties of partitions – separability and almost separability – are next introduced. In particular, conditions are determined for these properties to be present in partitions whose associated vector is a vertex of a constrained-shape partition polytope; results of Sect. 2 can then be used to deduce conditions which assure the existence of optimal partitions which are (almost) separable.

Two subsets Ω¹ and Ω² of ${\mathbb{R}}^{d}$ are called separable if there exists a nonzero d-vector C such that

$${C}^{T}{u}^{1}>{C}^{T}{u}^{2}\ \ \text{ for all }\ {u}^{1} \in {\Omega }^{1}\ \text{ and }\ {u}^{2} \in {\Omega }^{2}.$$

(38)

Two subsets Ω¹ and Ω² of ℝ^d are called almost separable if there exists a nonzero d-vector C such that

$${C}^{T}{u}^{1}>{C}^{T}{u}^{2}\ \ \text{ for all }\ {u}^{1} \in {\Omega }^{1}\ \text{ and }\ {u}^{2} \in {\Omega }^{2}\ \text{ with }\ {u}^{1}\neq {u}^{2}.$$

(39)

In either of these cases, the vector C is referred to as a separating vector. Of course, separable sets are necessarily almost separable and disjoint and for disjoint sets almost separability and separability coincide. When two sets are not disjoint, almost separability (as defined by (39)) is the strongest possible form of separation by a linear functional, as the condition asserts strict separation of the values of the functional for all pairs of points at which the sets do not overlap. In particular, (39) implies that ${\Omega }^{1} \cap {\Omega }^{2}$ contains at most a single point, for if u and w were distinct points u and w in the intersection, it would follow that C ^T u > C ^T w and C ^T w > C ^T u.

The next lemma converts separability of finite sets to separability of their convex hulls, yielding as a corollary, a characterization of the former.

Lemma 12

Let Ω¹ and Ω² be two finite sets in ℝ^d and let C be a nonzero vector in ℝ^d . Then:

(a) ${C}^{T}{u}^{1}>{C}^{T}{u}^{2}$ for all ${u}^{1} \in {\Omega }^{1}$ and u ² ∈ Ω² if and only if ${C}^{T}{w}^{1}>{C}^{T}{w}^{2}$ for all ${w}^{1} \in \ \mathit{conv}\ {\Omega }^{1}$ and w ² ∈ conv Ω².
(b) ${C}^{T}{u}^{1}>{C}^{T}{u}^{2}$ for all u ¹ ∈ Ω¹ and u ² ∈ Ω² with u ¹ ≠u ² if and only if ${C}^{T}{w}^{1}>{C}^{T}{w}^{2}$ for all w ¹ ∈ conv Ω¹ and w ² ∈ conv Ω² with w¹≠w².

Call a partition $\pi = (\pi _{1},\ldots,\pi _{p})$ separable (almost separable) if the sets {A ⁱ : i ∈ π _j} for $j = 1,\ldots,p$ are pairwise separable (almost separable). The next result and its corollary establish (almost) separability of a partition π whose associated matrix A _π is a vertex of a corresponding constrained-shape partition polytope. Under the restriction that the columns of A are distinct, the results are due to Barnes et al. [4]. Some of the results about separable partitions for the general case appear in Hwang et al. [43] and the results about almost separable partitions are from Hwang and Rothblum [38, 40].

Theorem 32

Let Γ be a nonempty set of positive integer p-vectors with coordinate-sum n, and let π be a partition in Π^Γ where A_π is a vertex of P_A ^Γ . Then for some matrix C ∈ ℝ^d×p (with columns ${C}^{1},\ldots,{C}^{p}$ )

$$\begin{array}{rll} {({C}^{r} - {C}^{s})}^{T}u>&{({C}^{r} - {C}^{s})}^{T}w\ \mathit{for\ all\ distinct}\ r,s \in \{ 1,\ldots,p\},u \in \ \mathit{conv} \\ &\{{A}^{i} : i \in \pi _{r}\}\ \mathit{and}\ w \in \ \mathit{conv}\ \{{A}^{i} : i \in \pi _{s}\}\ \mathit{with}\ u\neq w.\end{array}$$

(40)

Corollary 7

Let Γ be a nonempty set of positive integer p-vectors with coordinate-sum n, and let π be a partition in Π^Γ where A_π is a vertex of P_A ^Γ . Then π is almost separable; further, if the columns of A are distinct, then π is separable.

The next result shows that when d = 1, the necessary condition for the matrix (in fact, vector for d = 1) associated with a partition to be a vertex is also sufficient.

Theorem 33

Let A ∈ ℝ^1×n , let $n_{1},\ldots,n_{p}$ be positive integers with $\sum _{j=1}^{p}n_{j} = n$ and let π be a partition with shape $(n_{1},\ldots,n_{p})$. Then π is almost separable if and only if A _π is a vertex of $P_{A}^{(n_{1},\ldots,n_{p})}(= P_{A}^{\Gamma }\ \mathit{for}\ \Gamma =\{ (n_{1},\ldots,n_{p})\})$.

The next result establishes the optimality of (almost) separable partitions.

Theorem 34

Let Γ be a nonempty set of positive integer p-vectors with coordinate-sum n and let f(⋅) be an (edge-)quasi-convex function on the constrained-shape partition polytope P _A ^Γ . Then there exists an optimal partition π which is almost separable and has A_π as a vertex of P_A ^Γ . Further, if f(⋅) is strictly (edge-)quasi-convex, then every optimal partition π is almost separable and has A_π as a vertex of P_A ^Γ . If A’s columns are distinct, the quantifier “almost” can be dropped in the above statements.

Theorem 34 shows that when considering constrained-shape partition problems with f( ⋅) (edge-)quasi-convex, it suffices to consider partitions that are (almost) separable (and whose associated matrix is a vertex of the corresponding partition polytope).

Cone-separability, another property of partitions, it next introduced and studied. In particular, it will be shown that single-size partition problems have optimal partitions with this property and an enumeration algorithm for solving such problems will be developed. It is emphasized that the results require the assumption that empty parts are allowed. Of course, single-size problems with empty parts prohibited (allowed) are instances of constrained-shape problems corresponding to the set of all positive (nonnegative) shapes. In particular, problems of either type have optimal solutions that are (almost) separable, and enumerating (almost) separable partitions can be used to solve these problems. The following stronger conclusions do not apply to single-size problem when parts are required to nonempty.

Two subsets Ω¹ and Ω² of ℝ^d are called cone-separable if there exists a nonzero d-vector C such that

$${C}^{T}{u}^{1}>0>{C}^{T}{u}^{2}\ \text{ for all }\ {u}^{1} \in {\Omega }^{1} \setminus \{ 0\}\ \text{ and }\ {u}^{2} \in {\Omega }^{2} \setminus \{ 0\}$$

(41)

(if either ${\Omega }^{1} \setminus \{ 0\}$ or Ω² ∖ { 0} is empty and the other set is not, then (41) is to be interpreted as a requirement just on the elements of the nonempty set).

A relation between cone-separability and (almost) separability is determined in the next lemma.

Lemma 13

Suppose Ω¹ and Ω² are subsets of ℝ^d that are cone-separable. Then Ω¹ and Ω² are almost separable with the vector C satisfying ( 41 ) as a separating vector; in this case, 0 is the only possible vector in Ω¹ ∩ Ω² . Further, if $0\notin {\Omega }^{1}$ or $0\notin {\Omega }^{2}$ , then Ω¹ and Ω² are separable.

The next example demonstrates that (almost)-separability does not imply cone-separability.

Example 7

For k = 1, 2, let ${\Omega }^{k} =\{ \left ({ k \atop 0} \right ),\left ({ k \atop 2} \right )\} \subseteq {\mathbb{R}}^{2}$. Then Ω¹ and Ω² are separable (with $C = \left ({ 1 \atop 0} \right )$ as a separating vector), but they are not cone-separable as a vector C ∈ ℝ² satisfies ${C}^{T}\left ({ 1 \atop 0} \right )>0$ if and only if ${C}^{T}\left ({ 2 \atop 0} \right )>0$.

Recall that the conic hull of a set $\Omega \subseteq {\mathbb{R}}^{d}$, denoted cone Ω, is defined as the set of linear combinations $\sum _{t=1}^{q}\Gamma _{t}{x}^{t}$ with Γ _t ≥ 0 and x ^t ∈ Ω for $t = 1,\ldots q$ (with cone ∅ = { 0}). A set is a cone if it equals its conic hull. A cone is pointed if for no nonzero vector x, both x and − x are in the cone. A fundamental result about cones (see Rockafellar [61], Schrijver [64], or Ziegler [71]) shows that a set is the conic hull of a finite set in ℝ^m if and only if it has a representation {x ∈ ℝ^m : Ax ≤ 0} with A as a real matrix having m columns; such a cone is called a polyhedral cone.

Lemma 14

Let Ω¹ and Ω² be two finite sets in ℝ^d and let C be a nonzero vector in ℝ^d . Then ${C}^{T}{u}^{1}>0>{C}^{T}{u}^{2}$ for all ${u}^{1} \in {\Omega }^{1} \setminus \{ 0\}$ and ${u}^{2} \in {\Omega }^{2} \setminus \{ 0\}$ if and only if ${C}^{T}{w}^{1}>0>{C}^{T}{w}^{2}$ for all ${w}^{1} \in \ (\mathit{cone}\ {\Omega }^{1}) \setminus \{ 0\}$ and ${w}^{2} \in \ (\mathit{cone}\ {\Omega }^{2}) \setminus \{ 0\}$.

The next result provides characterizations of cone-separability for polyhedral cones.

Lemma 15

Let Ω¹ and Ω² be two finite sets in ℝ^d . Then the following are equivalent:

(a) Ω¹ and Ω² are cone-separable.
(b) $\mathit{cone}\ {\Omega }^{1}$ and $\mathit{cone}\ {\Omega }^{2}$ are cone-separable.
(c) cone Ω¹ and cone Ω² are almost separable.
(d) cone Ω¹ and cone Ω² are pointed cones and $(\mathit{cone}\ {\Omega }^{1}) \cap (\mathit{cone}\ {\Omega }^{2}) =\{ 0\}$.

Attention is next turned to partition problems. Here, fixed-size problems with a relaxation of the assumption that partitions’ parts are nonempty are considered. Formally, let $\widehat{{\Pi }^{p}}$ denote the set of p-partitions which allow empty parts. For $\pi \in \widehat{ {\Pi }^{p}}$, A _π has the natural definition with the empty sum taken as zero. The partition polytope corresponding to $\widehat{{\Pi }^{p}}$ is defined by $P_{A}^{\widehat{{\Pi }^{p}} } \equiv \ \text{conv }\ \{A_{\pi } :\pi \in \widehat{ {\Pi }^{p}}\}$. As in all sum-partition problems, it is assumed that the objective function F over $\widehat{{\Pi }^{p}}$ is given by F(π) = f(A _π) with f as a real-valued function on $P_{A}^{\widehat{{\Pi }^{p}} }$.

Call a partition $\pi = (\pi _{1},\ldots,\pi _{p})$ cone-separable if the sets {A ⁱ : i ∈ π _j} for $j = 1,\ldots,p$ are pairwise cone-separable. Lemma 13 shows that cone-separability implies almost separability, with 0 as the only potential overlapping vector for any pair of parts; moreover, a cone-separable partition with at most one part containing indices that correspond to the 0 vector is separable. The next two theorems extend results of Barnes, Hoffman, and Rothblum [1992] (which considered the case where the A ⁱ’s are distinct).

Theorem 35

If $\pi \in \widehat{ {\Pi }^{p}}$ and A _π is a vertex of $P_{A}^{\widehat{{\Pi }^{p}} }$, then for some matrix C ∈ ℝ^d×p

$$\begin{array}{rll} {({C}^{r})}^{T}u>{({C}^{s})}^{T}u\ \ \mathit{for\ all\ distinct}\ r,s = 1,\ldots,p\ \mathit{and}\ u \in \ (\mathit{cone}\ \pi _{r}) \setminus \{ 0\}.\end{array}$$

(42)

Theorem 36

Let f(⋅) be an (edge-)quasi-convex on $P_{A}^{\widehat{{\Pi }^{p}} }$ . Then there exists an optimal partition π which is both cone-separable and (almost) separable and has A_π as a vertex of $P_{A}^{\widehat{{\Pi }^{p}} }$ . Further, if f(⋅) is strictly (edge-)quasi-convex, then every optimal partition π is cone-separable and has A_π as a vertex of $P_{A}^{\widehat{{\Pi }^{p}} }$.

The next two results consider cone-separable partitions when d = 1 and d = 2.

Theorem 37

Suppose A ∈ ℝ^1×n has no zero column. If p ≥ 2, then a p-partition $\pi$ is cone-separable if and only if two of its parts are $\{i \in \mathcal{N} : {A}^{i}>0\}$ and $\{i \in \mathcal{N} : {A}^{i} < 0\}$ (where either may be empty) and all other parts are empty. If p = 1, then the (only) 1-partition $(\mathcal{N})$ is cone-separable if and only if A contains either just positive elements or just negative elements.

To study cone-separable partitions with d = 2, associate each x ∈ ℝ² ∖ { 0} with the angular coordinate of its polar-coordinate representation denoted ϕ(x) (measured in degrees). It is observed that for $\varnothing \neq C \subseteq {\mathbb{R}}^{2} \setminus \{ 0\}$, $C \cup \{ 0\}$ is a pointed polyhedral cone if and only if for some $0 \leq \underline{\phi }\leq \overline{\phi }$: ϕ < 360^∘, $\overline{\phi } <\underline{\phi } +18{0}^{\circ }$ and

$$\begin{array}{lll} C = \left \{\begin{array}{lll} \{x \in {\mathbb{R}}^{2} \setminus \{ 0\} :&\underline{\phi } \leq \phi (x) \le \overline {\phi} \} &\text{ if } \overline{\phi } < 36{0}^{\circ } \\ \{x \in {\mathbb{R}}^{2} \setminus \{ 0\} :&\underline{\phi } \leq \phi (x) < 36{0}^{\circ }\ \text{ or }\ 0 \leq \phi (x) \leq \overline{\phi }- 36{0}^{\circ }\}&\text{ if }\ \overline{\phi } \geq 36{0}^{\circ }. \end{array} \right.& \end{array}$$

Theorem 38

Suppose A ∈ ℝ^2×n has no zero column. A p-partition π is cone-separable if and only if for some q ≤ p, there exist $0 \leq \underline{\phi }_{1} \leq \overline{\phi }_{1} <\underline{\phi } _{2} \leq \overline{\phi }_{2} <\ldots <\underline{\phi } _{q} \leq \overline{\phi }_{q} <\underline{\phi } _{1} + 36{0}^{\circ }$ such that ϕ _q < 360^∘, $\overline{\phi }_{t} -\underline{\phi }_{t} < 18{0}^{\circ }$ for each $t = 1,\ldots,q$, and the nonempty parts of π are

$$\{i \in \mathcal{N} :\underline{\phi } _{t} \leq \phi ({A}^{i}) \leq \overline{\phi }_{ t}\}\ \mathit{for}\ t = 1,\ldots,q - 1$$

and

$$\begin{array}{rcl} \left \{\begin{array}{lll} \{i \in \mathcal{N} :&\underline{\phi }_{q} \leq \phi ({A}^{i}) \leq \overline{\phi }_{q}\} &\mathit{if }\ \overline{\phi }_{q} < 36{0}^{\circ } \\ \{i \in \mathcal{N} :&\underline{\phi }_{q} \leq \phi ({A}^{i}) < 36{0}^{\circ }\ \mathit{or}\ 0 \leq \phi ({A}^{i}) \leq \overline{\phi }_{q} - 36{0}^{\circ }\}&\mathit{if }\ \overline{\phi }_{q} \geq 36{0}^{\circ };\end{array} \right.& & \\ \end{array}$$

further, the ϕ _t’s and $\overline{\phi }_{t}$’s can be selected from $\{\phi ({A}^{i}) : i \in \mathcal{N}\}$.

6 Multiparameter: Combinatorial Approach

This section explores a combinatorial approach which can apply to partition problems that are more general than the sum-partition problems studied in Sect. 5. Here, vector partitions rather than index partitions are used – see the first three paragraphs of Sect. 5.

A finite subset Ω¹ of ℝ^d penetrates another finite subset Ω² of ℝ^d, written Ω¹ → Ω², if Ω¹ ∩ (conv Ω²)≠∅; Ω¹ strictly penetrates Ω² if ${\Omega }^{1} \cap (\text{ri}[\mbox{ conv}\,{\Omega }^{2}])\neq \varnothing $. Some implications about penetration that are true in ℝ¹ do not hold in ℝ^d with d > 1. For example, Ω¹ → Ω² and Ω² ↛ Ω¹ no longer imply conv Ω¹ ⊂ conv Ω².

Properties of partitions of vectors in ℝ^d, with d > 1, are next introduced. The fact that some implications of penetration in ℝ¹ do not hold in ℝ^d for d > 1 implies that some properties of partitions of points in ℝ¹ that were defined in terms of penetration have more than one counterpart for partitions of points in ℝ^d.

A sphere in ℝ^d is defined as a set of the form $S =\{ x \in {\mathbb{R}}^{d} :\| x - a\| \leq R\}$ for some a ∈ ℝ^d and R > 0, where $\|\qquad \vert $ stands for the Euclidean norm. The interior and boundary of such a sphere is given by $\text{int }\ S =\{ x \in {\mathbb{R}}^{d} :\| x - a\| < R\}$ and $\text{bd }S =\{ x \in {\mathbb{R}}^{d} :\| x - a\| = R\}$.

Consider a partition $\pi = (\pi _{1},\ldots,\pi _{p})$ of a finite set of vectors in ℝ^d. The following properties of such partitions will be considered; the first three were introduced in Sect. 5.

Separable (S) (also referred to as “disjoint” in the literature): For all distinct $r,s = 1,\ldots,p$, π _r and π _s are separable, that is, $(\mbox{ conv}\,\pi _{r}) \cap (\mbox{ conv}\,\pi _{s}) = \varnothing $.
Almost separable (AS): For all distinct $r,s = 1,\ldots,p$, π _r and π _s are almost separable, that is, $(\mbox{ conv}\,\pi _{r}) \cap (\mbox{ conv}\,\pi _{s})$ contains at most a single point, and if the intersection contains a point, it is a vertex of both conv π _r and conv π _s.
Cone-separable (C _n S): For all distinct $r,s = 1,\ldots,p$, π _r and π _s are cone-separable, that is, $(\mbox{ cone}\,\pi _{r}) \cap (\mbox{ cone}\,\pi _{s}) =\{ 0\}$.
Sphere-separable (SS): For all distinct $r,s = 1,\ldots,p$, there exists a sphere S ⊂ ℝ^d such that either (U, W) = (π _r, π _s) or (U, W) = (π _s, π _r), satisfies S ⊇ U and S ∩ W = ∅.
Convex-separable (C _v S): For all distinct $r,s = 1,\ldots,p$, there exists a convex set S ⊂ ℝ^d such that either (U, W) = (π _r, π _s) or (U, W) = (π _s, π _r), satisfies S ⊇ U and S ∩ W = ∅. (Evidently, without loss of generality, it is possible to assume that dim S = d.)
Almost sphere-separable (ASS): For all distinct $r,s = 1,\ldots,p$, there exists a sphere S ⊂ ℝ^d such that either $(U,W) = (\pi _{r},\pi _{s})$ or $(U,W) = (\pi _{s},\pi _{r})$, satisfies S ⊇ U and | S ∩ W | ≤ 1 and if x is a single point in S ∩ W, then U ∩ (bd S) = { x}.
Nonpenetrating (NP): For all distinct $r,s = 1,\ldots,p$, $\pi _{r}\not\rightarrow \mbox{ conv}\,\pi _{s}$.
Noncrossing (NC): For all distinct $r,s = 1,\ldots,p$, either $(\mbox{ conv}\,\pi _{r}) \cap (\mbox{ conv}\,\pi _{s}) = \varnothing $ or one convex hull is contained in the other with no member of the larger part penetrating the smaller part.
Acyclic (AC): There do not exist q ≥ 2 parts of π, say $\pi _{i_{1}},\ldots,\pi _{i_{q}}$, such that $\pi _{i_{1}} \rightarrow \pi _{i_{2}} \rightarrow \ldots \rightarrow \pi _{i_{q}} \rightarrow \pi _{i_{1}}$.
Monopolistic (M _p): One part has all elements. Note that if empty parts are prohibited, there is only one partition satisfying M _p and its size is 1, if empty parts are allowed it means that p − 1 parts are empty.
Nearly monopolistic (N _e M _p): At most one part of π has more than a single point.
Nearly cone-separable (N _e C _n S): For all distinct $r,s = 1,\ldots,p$, π _r and π _s are either cone-separable or at least one of them is a singleton.

A property defined by penetration is called “weak” if the penetration allows touching at boundary. Also note that two subsets Ω¹ and Ω² of ℝ^d are separable if and only if there exists a (closed) half-space S such that S ⊇ Ω¹ and S ∩ Ω² = ∅, and Ω¹ and Ω² are cone-separable if and only if there exists a (closed) cone S of dimension d such that S ⊇ Ω¹ and S ∩ Ω² = { 0}.

S and C _n S were first considered in Barnes et al. [4]; C _v S (called “noncrossing”) in Boros and Hwang [7]; AS in Hwang and Rothblum [38]; SS, NP (called “nested”), and AC (called “connected”) in Boros and Hammer [6]; NC (for d = 1) in Kreweras [48], and N _e M _p in Pfersky et al. [60].

When d = 1, both S and NP reduce to consecutiveness, while $N_{e}C_{n}S$, NC, SS, AC, and C _v S all reduce to noncrossingness. Further, when d = 1, partitions in C _n S can have at most one part containing positive elements, at most one part containing negative elements, and some parts each consisting of only the 0-element (but may have multiple copies). For a set of points with k 0-elements, this set is not size(p)-regular for p > k + 2 and not shape-regular for almost all shapes. Hence, C _n S is not studied for d = 1. For d = 1, when there are enough 0-elements, C _n S reduces to bi-extremal, but N _e M _p does not reduce to extremal since the singletons can be anywhere, not necessarily the smallest elements.

Theorem 39

The relations among the properties of partitions that were introduced are represented in Fig. 4 .

Golany et al. [29] proved an unexpected relation between S and SS. Here, an “almost” version of the result is added. To state these results, the partitioned vectors are embedded in R ^{d + 1} by defining

$$\widehat{{A}}^{i} \equiv \left ({ {A}^{i} \atop \|{A{}^{i}\|}^{2}} \right ) \in {\mathbb{R}}^{d+1}\ \text{ for }\ i = 1,\ldots,n\,.$$

(43)

Next, for each p-partition $\pi = (\pi _{1},\ldots,\pi _{p})$, let $\widehat{\pi }$ be the p-partition for the problem with partitioned vectors $\widehat{{A}}^{1},\ldots,\widehat{{A}}^{n}$ with $\widehat{\pi }_{j} =\{\widehat{ {A}}^{i} : {A}^{i} \in \pi _{j}\}$ for $j = 1,\ldots,p$; in particular,

$$\widehat{A}_{\widehat{\pi }} = \left [\sum _{{A}^{i}\in \pi _{1}}\widehat{{A}}^{i},\ldots,\sum _{{ A}^{i}\in \pi _{p}}\widehat{{A}}^{i}\right ] \in {\mathbb{R}}^{(d+1)\times p}.$$

(44)

Theorem 40

A partition π is (almost) sphere-separable if and only if $\widehat{\pi }$ is (almost) separable when the partitioned vectors are $\widehat{{A}}^{1},\ldots,\widehat{{A}}^{n}$.

The proofs of (Golany et al. [29]) of the equivalences of Theorem 40 are constructive and show how to convert separating vectors that verify (almost) separability into spheres that verify (almost) sphere separability and vice versa.

The next result provides conditions that suffice for (A)SS ⇒ (A)S.

Lemma 16

Assume that all the A ⁱ ’s lie on the boundary of a sphere. Then a partition is (almost) sphere-separable if and only if it is (almost) separable.

Next, the number of Q-partitions for each property Q introduced earlier is bounded. Since vector-partitions are considered, the counts may depend on the partitioned vectors. For each A ∈ ℝ^{d ×n}, let # _Q ^A(p) be the number of p-partitions of the columns of A that satisfy Q, allowing empty parts. The emphasis is to examine whether $\#_{Q}(n,p,d) \equiv \max _{A\in \mathbb{R}_{}^{d\times n}}\#_{Q}^{A}(p)$ grows exponentially or polynomially in n, with p ≥ 2 and d fixed (in the latter case, enumeration algorithms exist for all relevant properties Q, see [39]. Counting of a polynomial class is usually done by providing an upper bound of the form O(n ^m) for some positive m.

Every unlabeled p-partition corresponds to at most p! labeled partitions (the option for identical parts implies that the bound needs not always be tight). Thus, the number of labeled p-partitions is bounded by p! times the number of unlabeled p-partitions (for index-partitions, there are no identical parts, so the bound is realized), a multiplier that is independent of n. Consequently, the O( ⋅) order of the classes of labeled and unlabeled partitions with a given property coincide. Here, it is convenient to count the labeled classes. Also, empty parts will be allowed.

When columns of A are not distinct, a vector x ∈ ℝ^d is referred to as a multiple point (ofA) if it appears more than once among the columns of A; the multiplicity of a multiple point x is the number of times x appears among the columns of A. As vector partitions are considered, two partitions are identified if their parts are assigned, respectively, the same number of copies of each multiple point. The next lemma records a standard combinatorial fact.

Lemma 17

Suppose a point in ℝ^d has n copies. Then there are $\left ({ n-1 \atop p-1} \right )$ ways of splitting the point into p nonempty (labeled) parts and $\left ({ n+p-1 \atop p-1} \right )$ ways of splitting it into p parts when empty parts are allowed; in either case, the number is bounded by O(n ^p−1 ).

A set of points in ℝ^d is called generic, or in general position, if for k = 1, …, d no k + 1 points of them lie in a k − 1-dimensional hyperplane. A matrix is called generic, if the set of its columns is generic. Harding [33, Theorem 1] gave an exact count of the number of A-separable 2-partitions when A is generic. Harding’s result is next presented in a more general context which corrects [33, Theorem 2]; see Hwang and Rothblum [38]. For n, d ≥ 1, define the (n, d)-Harding number, denoted H(n, d), by

$$H(n,d) \equiv \sum _{j=0}^{d}\left ({ n - 1 \atop j} \right ),$$

(45)

(where the standard convention that $\left ({ n \atop 0} \right ) = 1$ for each n ≥ 0 and $\left ({ n \atop k} \right ) = 0$ if k > n is used).

For A ∈ ℝ^{d ×n} and E ∈ ℝ^{d ×u}, a p-partition π is (A, E)-separable if every pair of parts of π are separable by a hyperplane that contains the columns of E.

Theorem 41

Suppose A ∈ ℝ^d×n and E ∈ ℝ^d×u , where 0 ≤ u ≤ d and [A,E] is generic. With empty parts allowed, the number of (A|E)-separable 2-partitions is $2H(n,d - u) \leq {2}^{d-u+1}\left ({ n \atop d-u} \right )$ . With empty parts prohibited, the number of (A|E)-separable 2-partitions is 2H(n,d − u) − 2 if either u < d or if u = d and neither side of the unique hyperplane spanned by the columns of E contains all of A’s columns. Finally, if empty parts are prohibited, u = d and all columns of A are on one side of the unique hyperplane spanned by the columns of E, then the number of (A|E)-separable is 0.

Corollary 8

Suppose $A \in {\mathbb{R}}^{d\times n}$ is generic. Then $\#_{S}^{A}(2) \leq 2H(n,d) = O({n}^{d})$.

Hwang et al. [43] proved the equality $\#_{S}^{A}(2) = O({n}^{d})$ of Corollary 8 without using the Harding function. They also extended their result to arbitrary A in ℝ^{d ×n} by perturbing A into A′ which is generic and proved that the set of A-separable 2-partitions is contained in the set of A′-separable 2-partitions, thus bounding $\#_{S}^{A}(2)$ by $\#_{S}^{A'}(2)$ which equals O(n ^d) by Corollary 8. Finally, they gave a construction of each A-separable partition by merging a given set of $\left ({ p \atop 2} \right )$ 2-partitions π _ij of A, for all “1 ≤ i < j ≤ p,” and showed that all A-separable p-partitions can be obtained from such constructions (by varying the given set of 2-partitions).

Theorem 42

For A ∈ ℝ^d×n, $\#_{S}^{A}(p) \leq {[\#_{S}^{A}(2)]}^{\left ({ p \atop 2} \right )} = O({n}^{d\left ({ p \atop 2} \right )})$ . Also, $\#_{S}(n,p,d) = O({n}^{d\left ({ p \atop 2} \right )})$.

Alon and Onn [3] proved that the $O({n}^{d\left ({ p \atop 2} \right )})$ bound of Theorem 42 cannot be improved for either p ≥ 3 or d ≥ 3.

Using Theorem 40, the above results can be used to derive bounds for sphere-separability.

Theorem 43

Suppose A ∈ ℝ^d×n , p ≥ 2 and $\bar{n}$ is the number of distinct columns of A. Then $\#_{SS}^{A}(p) \leq {[\#_{SS}^{A}(2)]}^{\left ({ p \atop 2} \right )} \leq {[2H(\bar{n},d + 1)]}^{\left ({ p \atop 2} \right )} \leq {[{2}^{d+1}\left ({ \bar{n}\atop d+1} \right )]}^{\left ({ p \atop 2} \right )}$ . Also, $\#_{SS}(n,p,d) = O({n}^{(d+1)\left ({ p \atop 2} \right )})$.

Given A ∈ ℝ^{d ×n}, Theorem 40 also shows that any enumeration method for separable partitions immediately yields a method for enumerating the A-sphere-separable partitions (with a complexity bound obtained by substituting d + 1 for d).

Hwang and Rothblum [39] obtained the following result.

Theorem 44

For A ∈ ℝ^d×n having no zero-column, $\#_{C_{n}S}^{A}(p) \leq {[\#_{C_{n}S}^{A}(2)]}^{\left ({ p \atop 2} \right )} \leq {[2H(n,d - 1)]}^{\left ({ p \atop 2} \right )} = O[{n}^{(d-1)\left ({ p \atop 2} \right )}]$ . Also, $\#_{C_{n}S}(n,p,d) \leq {[2H(n,d - 1)]}^{\left ({ p \atop 2} \right )} = O({n}^{(d-1)\left ({ p \atop 2} \right )})$.

Almost separable partitions are next considered, starting with 2-partitions. The result follows Hwang and Rothblum [40].

Lemma 18

For A ∈ ℝ^d×n having $\widetilde{n}$ distinct columns, $\#_{AS}^{A}(2) \leq 2H(n,d) +(n -\widetilde{ n})[2H(\widetilde{\ n} - 1,d - 1)] \leq 2H(n,d)$ . Also, $\#_{AS}(n,2,d) = \#_{S}(n,2,d) = 2H(n,d)$.

Two difficulties prevent one from mimicking the perturbation argument and the merging argument used for separable partitions to obtain an extension of Lemma 18 to general p. First, using the same perturbation A to A′, the set of A-almost separable 2-partitions is no longer contained in the set of A′-separable partitions (though for generic A′, almost separable and separable coincide). Thus, one cannot transform a bound of the latter into a bound of the former. Second, the straightforward merging construction for separable partitions is not delicate enough to cover the almost separable case. Hwang and Rothblum [40] showed how to overcome these two problems. The first problem is solved by using the second conclusion of Lemma 18, that is, $\#_{AS}(n,2,d) = 2H(n,d)$ for the need to bound $\#_{AS}^{A}(2)$. They also gave a much more elaborated construction to deliver the merging successfully. The resulting bound of $\#_{AS}(n,p,d)$ is given in the next theorem.

Theorem 45

$\#_{AS}(n,p,d) \leq {[\#_{AS}(n,2,d)]}^{\left ({ p \atop 2} \right )} = O({n}^{d\left ({ p \atop 2} \right )})$.

The following result is an immediate consequence of Theorems 40 and 45.

Theorem 46

$\#_{ASS}(n,p,d) = O({n}^{(d+1)\left ({ p \atop 2} \right )})$.

The following Table 8 summarizes the bounds of # _Q(n, p, d) for those properties Q discussed above.

Several simple bounds on the number of partitions that satisfy other properties are next recorded. First, it is trivial that $\#_{MP}(n,p,d) = \left ({ n \atop p-1} \right ) = O({n}^{p-1})$, consequently, $\#_{NeMP}(n,p,d) = p!\left ({ n \atop p-1} \right )$. Also, a careful combinatorial argument yields $\#_{NeCnS}(n,p,d) =\sum _{ q=0}^{p-1}\left ({ n \atop q} \right )q!\#_{CnS}(n - q,p - q,d) = O({n}^{(d-1)\left ({ p \atop 2} \right )+p-1})$. Finally, for Q ∈ { NP, AC, NC}, Hwang, Lee, Liu, and Rothblum [45] gave examples to show that even for d = p = 2, the number of Q-partitions is exponential. From the implication relation shown in Fig. 4, the number of CvS partitions must also be exponential. Finally, it is easily verified that all weak properties are exponential.

Table 8 Bounds of # _Q(n, p, d)

Full size table

Consistency and sortability, as introduced in Sect. 4, refer to properties of index-partitions, without reference to the representation of the partitioned elements. However, p-partition or k-subpartition is said to satisfy Q, and then satisfaction of a part or parts of indices always implies the same satisfaction of the part or parts of the corresponding points. Further, the extension of the notion of satisfaction from a 1-dimensional point to a multi-dimensional point is straightforward. Hence, consistency and sortability can still be discussed for vector-partitions, and the approach of using Q-sortability to prove the existence of a Q-optimal partition is valid as it was for the single-parameter case.

The remainder of this section discusses geometric properties that are present in optimal partitions. Let $\{{A}^{1},\ldots,{A}^{n}\}$ be a set of n vectors in ℝ^d and let F( ⋅) denote an objective function over partitions of these vectors. Throughout the remainder of this section, unless explicitly stated otherwise, F( ⋅) is to be minimized in the partition problems that are considered (as is usually the case in “clustering problems”). Also, the results for size problems hold regardless of whether empty parts are allowed or not (for constrained-shape problems, prohibiting empty parts is captured by having the set of feasible shapes contain only positive integer vectors). However, if a result depends on allowing empty parts or on prohibiting such parts, then the condition will be stated explicitly.

Golany et al. [29] extended results of Barnes et al. [4] to the following (which extends Theorem 34):

Theorem 47

Suppose

$$F(\pi ) = f(\vert \pi _{1}\vert,\ldots,\vert \pi _{p}\vert,A_{\pi }),$$

(46)

where for each fixed set of values $(\vert \pi _{1}\vert,\ldots,\vert \pi _{p}\vert )$ , f is (edge-)quasi-concave on the corresponding single-shape partition polytope (in the d × p variables of A _π ). Then every constrained-shape problem has an almost separable optimal partition. If furthermore, f is strictly (edge-)quasi-concave, then every optimal partition for a constrained-shape problem is almost separable.

Recall the use of “ $\widehat{\ \ }$ ” introduced in (43) and (44). The following result is reported in Golany et al. [29].

Corollary 9

Suppose

$$F(\pi ) = f(\vert \pi _{1}\vert,\ldots,\vert \pi _{p}\vert,\widehat{A}_{\pi })\,$$

(47)

where for each fixed set of values $(\vert \pi _{1}\vert,\ldots,\vert \pi _{p}\vert )$ , f is (edge-)quasi-concave on the corresponding single-shape partition polytope (in the (d + 1) × p variables of $\widehat{A}_{\pi }$ ). Then every constrained-shape problem has an almost sphere separable optimal partition. Further, if f is strictly (edge-)quasi-concave, then every optimal partition for a constrained-shape problem is almost sphere separable.

If the function f in Corollary 9 is linear in the variables corresponding to $\widehat{A}_{\pi }$ and in the shape-variable, the results of Sect. 5 can be used; that is, bounded-shape partition problems with such objective functions can be solved by LP methods, applied in the enlarged (d + 2)-dimensional space where the partitioned vectors are $\left ({ \widehat{{A}}^{i} \atop 1} \right )$.

Following Golany et al. [29], Corollary 9 is next used to address some clustering problems.

Theorem 48

Let $w_{1},\ldots,w_{p}$ be positive numbers. Suppose F(⋅) has either one of the following three expressions:

(i)
$F(\pi ) = \sum _{j=1}^{p}w_{ j}\sum _{{A}^{i},{A}^{k}\in \pi _{j}}\|{A}^{i} - {A{}^{k}\|}^{2}$.
(ii)
$F(\pi ) = \sum _{j=1,\pi _{j}\neq \varnothing }^{p}w_{ j}\sum _{{A}^{i}\in \pi _{j}}\|{A}^{i} -\bar{ A}{_{ j}\|}^{2}$ , where for $j = 1,\ldots,p$,

$\bar{A}_{j} = \frac{\sum _{{A}^{k}\in \pi _{ j}}{A}^{k}} {\vert \pi _{j}\vert }$.
(iii)
$F(\pi ) = \sum _{j=1}^{p}w_{ j}\sum _{{A}^{i}\in \pi _{j}}\|{A}^{i} - t{_{ j}\|}^{2}$ , where $t_{1},\ldots,t_{p}$ are prescribed d-vectors.

Then every constrained-shape problem has an almost sphere separable optimal partition.

The third objective function can be expressed as a linear function of $\widehat{A_{\pi }}$ and of the parts’ sizes. Consequently, the comment following Corollary 9 applies and corresponding bounded-shape partition problems can be solved by LP methods.

Uniform versions of the objective functions that are listed in Theorem 48 are next considered, where uniform means that the w _j’s are constant.

Theorem 49

Consider uniform versions of the objective functions that are listed in Theorem 48. Then:

(i)
For a constrained-shape problem with constant part-sizes and with the first function, every optimal partition is almost separable.
(ii)
For a constrained-shape problem with the second function, every optimal partition is almost separable.
(iii)
For a constrained-shape problem with the third function, there exists an almost separable optimal partition.

Boros and Hammer [6] considered case (i) of Theorem 49 and proved that every single size problem has a weakly separable optimal partition. Theorem 49 strengthens their result in several ways: A stronger property is satisfied for every optimal partition, and the conclusion holds for a wider class of partition problems.

Attention is next turned to single-size problems. For a p-partition π and $i \in \mathcal{N}$, let j _π(i) denote the index of the part of π that contains A ⁱ (here, distinction must be made between multiple copies of the same vector).

Lemma 19

Let $t_{1},\ldots,t_{p}$ be prescribed distinct vectors in ℝ^d . Then, allowing for empty parts, there is a separable partition π such that

$$\vert \vert {A}^{i} - t_{ j_{\pi }(i)}\vert \vert \leq \vert \vert {A}^{i} - t_{ j_{\pi '}(i)}\vert \vert \ \ \mathit{for\ every\ partition}\ \pi '\ \mathit{and}\ i \in \mathcal{N}.$$

(48)

Further, a partition π satisfies ( 48 ) if and only if each vector A ⁱ is assigned to a part π _j for which $\|{A}^{i} - t_{j}\| = min_{u=1,\ldots,p}\ \|{A}^{i} - t_{u}\|$; each such partition is weakly separable.

Theorem 50

Let $t_{1},\ldots,t_{p}$ be prescribed distinct vectors in ℝ^d and consider the single-size problem, allowing empty parts, which has

$$F(\pi ) = f(\|{A}^{1} - t_{ j_{\pi }(1)}\|,\ldots,\|{A}^{n} - t_{ j_{\pi }(n)}\|),$$

(49)

where $f : {\mathbb{R}}^{n} \rightarrow \mathbb{R}$ is nondecreasing. Then the partition π which assigns each vector Aⁱ to the part π _j with the lowest index j among those for which $\|{A}^{i} - t_{j}\|$ is minimized is both optimal and separable. Further, if f is increasing, then a partition π is optimal if and only each vector Aⁱ is assigned to a part π _j for which $\|{A}^{i} - t_{j}\| = min_{u=1,\ldots,p}\ \|{A}^{i} - t_{u}\|$ , and every optimal partition is weakly separable.

Corollary 10

The conclusions of Theorem 50 hold for the single-size problem, allowing empty parts, which has

$$F(\pi ) =\sum _{ j=1,\pi _{j}\neq \varnothing }^{p}\sum _{{ A}^{i}\in \pi _{j}}h(\|{A}^{i} - t_{ j}\|),$$

(50)

where $t_{1},\ldots,t_{p}$ be prescribed distinct vectors in ℝ^d where h : ℝ → ℝ is nondecreasing/increasing.

Theorem 50 and Corollary 10 specify a single separable partition that is uniformly optimal for all underlying partition problems, independently of the functions f and h that occur in (49) and (50).

The characterization of optimal partition of Theorem 50 (and Corollary 10) allows one to construct all optimal solutions by assigning each vector A ⁱ to any part π _j for which t _j is the closest to A ⁱ. More specifically, associate each vector A ⁱ ∈ ℝ^d with the vector ${D}^{i} \equiv {(\|{A}^{i} - t_{1}\|,\ldots,\|{A}^{i} - t_{p}\|)}^{T} \in {\mathbb{R}}^{p}$. A partition is then optimal if and only if it assigns A ⁱ to π _j with D _j ⁱ being any minimal coordinate of D ⁱ. The amount of effort needed to compute each vector D ⁱ is O(dp), so the total amount of effort to compute all of the D ⁱ’s (and thereby solve the partition problem) is O(dpn).

Since the boundary of the convex hull of two parts can contain points of both parts, the optimal partitions identified in Theorem 50 and Corollary 10 need not be almost separable, let alone separable. Still, it is noted that breaking ties by using any part-ranking will produce separable partitions.

The geometric figure of the partition of ℝ^d into p convex subsets S ₁, …, S _p with S _j consisting of all points in ℝ^d closest to t _j (a point on a boundary is closest to all t _j whose S _j shares that boundary) is known in the literature as the Voronoi diagram. Efficient constructions of Voronoi diagrams are well studied in theoretical computer science (see Fortune [25] for a survey). In particular, hyperplanes representing a Voronoi diagram for p given points $t_{1},\ldots,t_{p}$ in ℝ^d can be constructed in $O({d}^{\frac{p+1} {2} })$-time. For d = 2, Shamos and Hoey [65] showed that the Voronoi diagram can be constructed in O(plogp)-time.

A tool is next developed for using results about properties of optimal partitions of problems with objective functions that depend on prescribed vectors to results where the prescribed vectors are replaced by some minimizing vectors. Two new definitions are needed to present a formal result. A function g : ℝ^d → ℝ is inverse-bounded if for every K ∈ ℝ {x ∈ ℝ^d : | g(x) | ≤ K} is a bounded set of ℝ^d. Evidently, an inverse-bounded function that is continuous is guaranteed to attain a minimum over ℝ^d. Also, let P ^∗(ℝ^d) be the set of finite subsets of ℝ^d, that is, ${P}^{{\ast}}({\mathbb{R}}^{d}) =\{ \Omega \subset {\mathbb{R}}^{d} : \vert \Omega \vert \ \text{ is finite}\ \}$.

Theorem 51

Let Q be a partition property, f : ℝ^p → ℝ be nondecreasing and $g : {P}^{{\ast}}({\mathbb{R}}^{d}) \times {\mathbb{R}}^{d} \rightarrow \mathbb{R}$ with the property that for every $\Omega \in {P}^{{\ast}}({\mathbb{R}}^{d})$ , g(Ω,⋅) is inverse-bounded and continuous. Let Γ be a set of p-integer vectors with coordinate-sum n. Assume that for any $t_{1},\ldots,t_{p} \in {\mathbb{R}}^{d}$ , some optimal partition of the constrained-shape problem corresponding to Γ and having objective function ${F}^{t_{1},\ldots,t_{p}}(\cdot )$ , empty parts not allowed, given by

$${F}^{t_{1},\ldots,t_{p} }(\pi ) = f[g(\pi _{1},t_{1}),\ldots,g(\pi _{p},t_{p})]\ \mathit{for\ each\ partition}\ \pi$$

(51)

satisfies Q. Then some optimal partition for the corresponding constrained-shape problem with objective function F(⋅) given by

$$F(\pi ) = f\left [\min _{x_{1}\in {\mathbb{R}}^{d}}g(\pi _{1},x_{1}),\ldots,\min _{x_{p}\in {\mathbb{R}}^{d}}g(\pi _{p},x_{p})\right ]\ \mathit{for\ each\ partition}\ \pi$$

(52)

satisfies Q. Further, the above holds with “every” replacing “some.”

Theorem 51 can be used to obtain Theorem 49 (ii) from Theorem 49 (iii) by noting that $\sum _{j=1}^{p}w_{j}\sum _{{A}^{i}\in \pi _{j}}\|{A}^{i} - t{_{j}\|}^{2}$ is minimized over $(t_{1},\ldots,t_{p})$ at $(\bar{A}_{1},\ldots,\bar{A}_{p})$ (with $\bar{A}_{j} = \frac{\sum _{{A}^{k}\in \pi _{ j}}{A}^{k}} {\vert \pi _{j}\vert }$ for $j = 1,\ldots,p$).

The proof of Theorem 51 in [39] can, in fact, be applied to a partition that is not optimal. Specifically, if π is any partition and $t_{1},\ldots,t_{p}$ are minimizers of $g(\pi _{1},\cdot ),\ldots,g(\pi _{p},\cdot )$, respectively, then any partition π ^# that satisfies ${F}^{t_{1},\ldots,t_{p}}{(\pi }^{\#}) \leq {F}^{t_{1},\ldots,t_{p}}(\pi )$ satisfies F(π ^#) ≤ F(π). In particular, any sorting method that is applicable to partition problems with objective function ${F}^{t_{1},\ldots,t_{p}}(\cdot )$ for fixed t _j’s which does not rely on a statistics depending on the t _j’s, applies to the partition problem with objective function F( ⋅). This is the case for p-sortings for which no statistics is needed.

For h : ℝ → ℝ, an h-centroid of a finite set Ω ⊆ ℝ^d is a minimizer of $\sum _{{A}^{i}\in \Omega }h(\|{A}^{i} - x\|)$ over x ∈ ℝ^d. When h is continuous, bounded from below and inverse-bounded, each finite set Ω has an h-centroid. (This definition extends the definition given in the paragraph preceding Theorem 25 from d = 1 to d > 1.)

Theorem 52

Consider the single-size problem which has

$$F(\pi ) =\sum _{ j=1}^{p}\sum _{{ A}^{i}\in \pi _{j}}h(\vert \vert {A}^{i} - c_{ j}\vert \vert ),$$

(53)

where h : ℝ → ℝ is nondecreasing, continuous, and inverse-bounded, and for $j = 1,\ldots,p$ , c _j is the h-centroid of π _j and where the sum over an empty set of h(⋅)-values in (53) is defined to be 0. Then there exists a separable optimal partition. Further, if h is increasing, then every optimal partition is weakly separable.

Corollary 11

Consider the partition problem of Theorem 52. Then there exists a separable optimal partition without empty parts. Further, if h is increasing, then every optimal partition has no empty parts.

Theorem 52 asserts the existence of a separable optimal partition that is specific to the function h which occurs in (53). This type of result is weaker than the uniform optimality of a specific separable partition asserted in Theorem 50.

The second paragraph following Theorem 51 shows that the assignment of each vector to its closest t _j can be used to replace a partition π that is not separable with one that is and has improved F-value where F( ⋅) is given by (53).

Partition problems that are next considered have objective functions that depend on the least radii of spheres that, respectively, include the parts. It is first shown that given a partition whose parts are included in prescribed spheres, there is a separable partition with the same property. The proof of this result follows an approach of Capoyleas, Rote, and Woeginger [9] who proved the case d = 2 (and claimed its extension to general d).

Lemma 20

Let $t_{1},\ldots,t_{p}$ be prescribed distinct vectors in ℝ^d . Allowing partitions to have empty parts, for each partition π, there exists a separable partition π′ such that

$$\max _{{A}^{i}\in \pi _{j}'}\vert \vert {A}^{i} - t_{ j}\vert \vert \leq \max _{{A}^{i}\in \pi _{j}}\vert \vert {A}^{i} - t_{ j}\vert \vert \ \ \ \mathit{for}\ j = 1,\ldots,p.$$

(54)

Further, if $r_{j} \equiv \max _{{A}^{i}\in \pi _{j}}\vert \vert {A}^{i} - t_{j}\vert \vert $ for $j = 1,\ldots,p$ (i.e., r _j is the smallest radius of a sphere that is centered at t _j and contains π _j), then π′ can be selected by the following rule: ${A}^{i} \in \pi _{j}'$ if j is the lowest index of those for which $\vert \vert {A}^{i} - t_{j}\vert {\vert }^{2} - r_{j}^{2} =\min _{u}[\vert \vert {A}^{i} - t_{u}\vert {\vert }^{2} - r_{u}^{2}]$ (of course, one may use any a priori permutation of the part-indices).

Capoyleas, Rote, and Woeginger [9] did not mention a tie-breaking rule; as such, the “power diagram” they used to repartition the vectors leads to weak separability rather than strict separability (earlier results in this section show that the number of weakly separable partitions is not necessarily polynomial in the number of partitioned vectors).

The separating hyperplane between parts π _j ′ and π _w ′ of the partition π′ constructed in Lemma 20 is given by

$$\begin{array}{llll} H & \equiv &\{x \in {\mathbb{R}}^{d} : \vert \vert x - t_{j}\vert {\vert }^{2} - r_{j}^{2} = \vert \vert x - t_{w}\vert {\vert }^{2} - r_{w}^{2}\} \\ & =&\{x \in {\mathbb{R}}^{d} : 2{(t_{w} - t_{j})}^{T}x = \vert \vert t_{w}\vert {\vert }^{2} -\vert \vert t_{j}\vert {\vert }^{2} + r_{j}^{2} - r_{w}^{2}\}.\end{array}$$

(55)

This hyperplane is called the power-hyperplane of the underlying spheres $\{x \in {\mathbb{R}}^{d} : \vert \vert x - t_{s}\vert \vert \leq r_{s}\}$ and $\{x \in {\mathbb{R}}^{d} : \vert \vert x - t_{w}\vert \vert \leq r_{w}\}$. Further, the construction of π′ from π in the proof of Lemma 20 is a sorting operation of all p-parts called power-hyperplane sorting. The use of p-sorting has the advantage that it achieves the goal in one step and therefore does not require a monotone statistics that shows iterative sortings will not cycle.

A natural variant of the use of power-hyperplane p-sorting to prove the conclusions of Lemma 20 is to iteratively apply power-hyperplane 2-sorting (of pairs of parts). For this approach to work, that is to avoid the possibility of cycling, it is necessary to identify a statistics over partitions that is reduced at each iteration. Unfortunately, it is not known that such a statistics exists. Still, power-hyperplane 2-sortings can be used to prove a weaker result. Specifically, consider $t_{1},\ldots,t_{p} \in {\mathbb{R}}^{d}$ and $r_{1},\ldots,r_{p} \in \mathbb{R}$. Lemma 20 implies that power-hyperplane p-sorting will convert a partition π whose parts are included, respectively, in the spheres $\{x \in {\mathbb{R}}^{d} : \vert \vert x - t_{j}\vert \vert \leq r_{j}\}$, $j = 1,\ldots,p$, into a separable partition having the same property. Here, one can use iterative power-hyperplane 2-sorting and assure that the process does not cycle because of strict lexicographic reduction of the statistics

$$s(\pi ) \equiv \left (\sum _{j=1}^{p}\sum _{{ A}^{i}\in \pi _{j}}[\vert \vert {A}^{i} - t_{ j}\vert {\vert }^{2} - r_{ j}^{2}],-\vert \pi _{ 1}\vert,\ldots,-\vert \pi _{p}\vert \right )$$

in each step. In particular, this proves that S is (sort-specific, 2, support)-sortable with “sort-specific” referring to the power-hyperplane method with fixed center and fixed radii.

Theorem 53

Let $t_{1},\ldots,t_{p}$ be prescribed distinct vectors in ℝ^d and consider the single-size problem, allowing empty parts, which has

$$F(\pi ) = f[\max _{{A}^{i}\in \pi _{1}}\vert \vert {A}^{i} - t_{ 1}\vert \vert,\ldots,\max _{{A}^{i}\in \pi _{p}}\vert \vert {A}^{i} - t_{ p}\vert \vert ],$$

(56)

where f : ℝ^p → ℝ is nondecreasing. Let π be a given partition. Then the partition π′ which satisfies the conclusions of Lemma 20 is separable and satisfies F( π′) ≤ F(π). In particular, the underlying single-size problem has a separable optimal partition.

Theorem 53 provides a construction (appearing explicitly in the statement of Lemma 20) of a separable partition that improves on the objective function of a given partition π. This result can be cast as p-sortability result, as were the results of Theorem 50 and Corollary 10. The construction of Theorem 53 is independent of the f appearing in (56), but unlike the construction of Theorem 50 (and Corollary 10), it is specific to the underlying partition π.

For a bounded set Ω ⊂ ℝ^d, let r(Ω) denote the minimum radius of a sphere that includes Ω; in particular, when Ω is finite, $r(\Omega ) =\inf _{x\in {\mathbb{R}}^{d}}[\sup _{y\in \Omega }\vert \vert x - y\vert \vert ]$, and the “inf” and “sup” are attained (and can therefore be replaced by “min” and “max”). For a p-partition π, let $r(\pi ) = (r(\pi _{1}),\ldots,r(\pi _{p}))$. The next result considers partition problems with objective function that is determined by these partition characteristics; Capoyleas, Rote, and Woeginger [9] considered the case d = 2.

Theorem 54

Consider the single-size problem which has F( π) = f[r(π)] where f : ℝ^p → ℝ nondecreasing. Then some optimal partition is separable.

A result of Pfersky et al. [60] is next generalized; their result is then deduced as a corollary.

Theorem 55

Consider the single-size problem, not allowing empty parts, which has

$$F(\pi ) =\sum _{ { u,v=1 \atop u\neq v} }^{p}\sum _{{ A}^{i}\in \pi _{u}}\sum _{{A}^{k}\in \pi _{v}}h({A}^{i},{A}^{k}),$$

(57)

where h(⋅,⋅) is a metric. Then there exists a nearly monopolistic optimal partition. If h is strict, then every optimal partition is nearly monopolistic. Finally, when allowing empty parts, “nearly monopolistic” can be replaced by “monopolistic.”

The reason that there is no strict version for Theorem 55 is that even if h is strict, when the A ⁱ’s are all the same point, then all p-partitions are optimal.

The Pfersky, Rudolf, and Woeginger result is next derived. It is stated in the original framework of a maximization problem (of course, multiplying the objective function by − 1 converts a maximization problem into an equivalent minimization problem).

Corollary 12

Consider the single-size problem in which

$$F(\pi ) =\sum _{ j=1}^{p}\sum _{{ A}^{i},{A}^{k}\in \pi _{j}}\|{A}^{i} - {A{}^{k}\|}^{2}.$$

(58)

is to be maximized. Then there exists a nearly monopolistic optimal partition.

Author information

Authors and Affiliations

National Chiao Tung University (Retired), Hsinchu, People’s Republic of China
Frank K. Hwang
Faculty of Industrial Engineering and Management Science, Technion - Israel Institute of Technology, Bloomfield – 511, 32000, Haifa, Israel
Uriel G. Rothblum

Authors

Frank K. Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Uriel G. Rothblum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Frank K. Hwang or Uriel G. Rothblum .

Editor information

Editors and Affiliations

Department of Industrial and Systems Eng, University of Florida, Gainesville, Florida, USA
Panos M. Pardalos
Department of Computer Science, University of Texas, Dallas, Richardson, Texas, USA
Ding-Zhu Du
Dept. Comp. Sci. & Engineering, University of California, San Diego, La Jolla, California, USA
Ronald L. Graham

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Hwang, F.K., Rothblum, U.G. (2013). Optimal Partitions. In: Pardalos, P., Du, DZ., Graham, R. (eds) Handbook of Combinatorial Optimization. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7997-1_40

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7997-1_40
Published: 26 July 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7996-4
Online ISBN: 978-1-4419-7997-1
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Optimal Partitions

Abstract

Similar content being viewed by others

Combinatorial Optimization Approaches for Data Clustering

k Is the Magic Number—Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

A Selective Overview of Recent Advances in Spectral Clustering and Their Applications

Keywords

1 Formulation and Classification of Partition Problems

1.1 Sets of Partitions

1.2 The Number of Characteristics Associated with each Partitioned Elements

1.3 The Objective Function

Lemma 1

Example 1

2 Preliminaries: Optimality of Extreme Points

Lemma 2

Lemma 3

Theorem 1

Theorem 2

Corollary 1

Lemma 4

Theorem 3

Corollary 2

Example 2

Theorem 4

3 Single-Parameter: Polyhedral Approach

Example 3

Lemma 5

Theorem 5

Theorem 6

Theorem 7

Theorem 8

Theorem 9

Theorem 10

Theorem 11

Theorem 12

Theorem 13

Example 4

Theorem 14

Theorem 15

Theorem 16

4 Single-Parameter: Combinatorial Approach

Example 5

Example 6

Theorem 17

Lemma 6

Lemma 7

Theorem 18

Corollary 3

Theorem 19

Lemma 8

Theorem 20

Theorem 21

Lemma 9

Lemma 10

Theorem 22

Theorem 23

Theorem 24

Theorem 25

Theorem 26

Theorem 27

Theorem 28

5 Multiparameter: Polyhedral Approach

Lemma 11

Theorem 29

Corollary 4

Theorem 30

Corollary 5

Corollary 6

Theorem 31

Lemma 12

Theorem 32

Corollary 7

Theorem 33

Theorem 34

Lemma 13

Example 7

Lemma 14

Lemma 15

Theorem 35

Theorem 36