1 Introduction

The notion of convex sets appears in various mathematical theories. A subset X of a real vector space is called a convex set if, for any \(x, y\in X\) and \(p\in [0,1]\), their convex combination \(px+(1-p)y\) is again in X. One basic use of it is to define the convexity of functions. A function f is said to be convex if \(f(px+(1-p)y) \le pf(x)+(1-p)f(y)\) for any convex combination \(px+(1-p)y\). Thus, convex sets are natural domains for convex functions to be defined on. Good examples of these notions can be found in information theory, where convexity is a fundamental property of important functions such as logarithm, entropy, and mutual information. Our InfoTheo library  [17] developed in the Coq proof assistant  [29] has a formalization of textbook proofs  [12] of such results.

In the course of formalizing such convexity results, we find that axiomatizing convex sets is a useful step which provides clarity and organizability in the results. We abstract the usual treatment of convex sets as subsets of some vector space and employ an algebraic theory of convex spaces, which was introduced by Stone  [27]. The formalization uses the packed class construction [15, 24], so as to obtain generic notations and lemmas, and more importantly, to be able to combine structures. Binary convex spaces are formalized in Sect. 2, and their multiary versions are formalized in Sect. 3, along with proofs of equivalence.

We also formalize an embedding of convex spaces into conical spaces (a.k.a. cones or real cones  [31]), which we find an indispensable tool to formalize convex spaces. Examples in the literature avoid proving properties of convex spaces directly and choose to work in conical spaces. This is especially the case when their goal can be achieved either way  [23, 31]. Some authors suggest that the results in conical spaces can be backported to convex spaces  [13, 21]. We apply this method in Sect. 4 to enable additive handling of convex combinations. By formalizing the relationship between convex and conical spaces, we work out short proofs of a number of lemmas on convex spaces. Among them is Stone’s key lemma  [27, Lemma 2], whose proof is often omitted in the literature despite its fundamental role in the study of convex spaces.

We complete this presentation with applications of our formalization to convex hulls (Sect. 5) and to convex functions (Sect. 6).

While our proofs do not introduce extra axioms, some libraries used in our development, such as mathcomp-analysis  [1], contain axioms which make parts of our work classical. In particular, our definition of convex sets is based on classical sets, assuming decidable membership.

2 Convex Spaces

Let us begin with the definition of convex spaces. As mentioned in the introduction, convex spaces are an axiomatization of the usual notion of convex sets in vector spaces. It has a long history of repeated reintroduction by many authors, often with minor differences and different names: barycentric algebra  [27], semiconvex algebra  [28], or, just, convex sets  [19].

We define convex spaces following Fritz  [14, Definition 3.1].

Definition 1

( in [18]). A convex space is a structure for the following signature:

  • Carrier set X.

  • Convex combination operations \((\_ \triangleleft {_{p}} \triangleright \_) : X \times X \rightarrow X\) indexed by \(p\in [0,1]\).

  • Unit law: \(x \triangleleft {_{1}} \triangleright y = x\).

  • Idempotence law: \(x \triangleleft {_{p}} \triangleright x = x\).

  • Skewed commutativity law: \(x \triangleleft {_{1-p}} \triangleright y = y \triangleleft {_{p}} \triangleright x\).

  • Quasi-associativity law: \(x \triangleleft {_{p}} \triangleright ({y \triangleleft {_{q}} \triangleright z}) = (x \triangleleft {_{r}} \triangleright y) \triangleleft {_{s}} \triangleright z\),

    where \(s=1-(1-p)(1-q)\) and \(r= {\left\{ \begin{array}{ll} p/s &{} \text {if }s\not = 0 \\ 0 &{} \text {otherwise} \end{array}\right. }\). (Note that r is irrelevant to the value of \((x \triangleleft {_{r}} \triangleright y) \triangleleft {_{s}} \triangleright z\) if \(s=0\).)

We can translate this definition to Coq as a packed class  [15] with the following mixin interface:

figure b

There are some notations and definitions to be explained. The type prob in the above Coq code denotes the closed unit interval [0, 1]. The notation is a notation for a real number r equipped with a canonical proof that \(0 \le \mathtt{r} \le 1\). The notation is for \(1 - \mathtt{p}\). The notation is for \(1-(1-\mathtt{p})(1-\mathtt{q})\), and for .

Intuitively, one can regard the convex combination as a probabilistic choice between two points. At line 3, the left argument is chosen with probability 1. The lines that follow correspond to idempotence, skewed commutativity, and quasi-associativity.

An easy example of convex space is the real line \(\mathbb R\), whose convex combination is expressed by ordinary addition and multiplication as \(pa + (1 - p)b\). Probability distributions also form a convex space. In the formalization, the type fdist A of distributions over any finite type A (borrowed from previous work  [6]) is equipped with a convex space structure, where the convex combination of two distributions \(d_1, d_2\) is defined pointwise as \(x \mapsto pd_1(x) + (1 - p)d_2(x)\).

As a result of the packed class construction, we obtain the type convType of all types which implicitly carry the above axioms. Then, each example of convex space is declared to be canonically a member of convType, enabling the implicit inference of the appropriate convex space structure. These two implicit inference mechanisms combined make the statement of generic lemmas on convex spaces simple and applications easy.

3 Multiary Convex Combination

Convex spaces can also be characterized by multiary convex combination operations, which combine finitely many points \(x_0, \dots , x_{n-1}\) at once, according to some finite probability distribution d over the set \(I_n = \{ 0, \dots , n-1 \}\), i.e., \(d_i \ge 0\) and \(\sum _{i<n} d_i = 1\). In this section we consider different axiomatizations, and their equivalence with the binary axioms.

3.1 Axiomatization

A definition of convex spaces based on multiary operations is given as follows (see for example  [10, Definition 5] and [16, Sect. 2.1]).

Definition 2

(Convex space, multiary version). A convex space based on multiary operations is a structure for the following signature:

  • Carrier set X.

  • Multiary convex combination operations, indexed by an arity n and a distribution d over \(I_n\):

  • Projection law: if \(d_j = 1\), . in [18])

  • Barycenter law: . in [18])

Note that in our Coq code, appears as or altConvn d x, indicating more explicitly that the operation takes two arguments d and x.

This multiary convex structure and the binary one given in Sect. 2 are equivalent: the multiary and binary operators interpret each other satisfying the needed axioms, and the interpretations cancel out when composed. While the binary axiomatization is easy to instantiate, the multiary version exhibits the relationship to probability distributions. Therefore we want to establish this equivalence before working further on other constructions over convex spaces.

In the literature, this equivalence is justified without much detail by referring to the seminal article by Stone  [27] (see, e.g., [19, Theorem 4], [10, Proposition 7]). Yet, what Stone gave is not an explicit axiomatization of the multiary convex operator, but a number of lemmas targeted at proving an embedding of (binary) convex spaces into vector spaces. These lemmas include the following one, that is seen as a justification for the barycenter law in the binary axiomatization.

Lemma 1

(Lemma 4 in [27]). If the given masses and their associated points are partitioned into groups (of non-zero total masses) in any way, then the center of mass is identical with that of masses equal to the respective total masses for the various groups, each placed at the center of mass for the corresponding group.

The relation to the barycenter law is implied if one sees a convex combination as a point defined in terms of a set of generating points \(\{x_j\}_{j<m}\) (they generate their convex hull). Then corresponds to grouping the generating points by filtering through the distributions \(\{e_i\}_{i<n}\). But this grouping is not necessarily a partition since there could be shared elements, hence the relation is not direct.

Beaulieu  [8, Definition 3.1.4] proposed an alternative multiary axiomatization, which was actually presented as a model for countable probabilistic choice (rather than a definition of convex space). His partition law corresponds exactly to the statement of Stone’s lemma.

Definition 3

(Convex space, Beaulieu style). A convex space is a structure for the previous operations and the following laws.

  • Partition law:   in [18])

    where \(\{K_j\mid j\in J\}\) is a partition of I, and \(\rho _j = \sum _{k\in K_j}\lambda _k\ne 0\).

  • Idempotence law:     if \(A_i = A\) for all \(\lambda _i > 0\). in [18])

In the implementation, using sets as indexing domains of the combination operators would be cumbersome, so that the partition law is actually expressed as follows, using a map \(\check{K}\) and Kronecker’s \(\delta \).

We also have to separately show that \((\delta _{j,\check{K}(k)}\frac{\lambda _k}{\rho _j})_{k<n}\) and \((\rho _j)_{j<m}\) form probability distributions. As an exceptional case, \((\delta _{j,\check{K}(k)}\frac{\lambda _k}{\rho _j})_{k<n}\) is replaced by a uniform distribution if \(\rho _j = 0\).

3.2 Equivalence of Axiomatizations

After considering the different axiomatizations, we decided to prove a triangular equivalence: between multiary convex structures in standard and Beaulieu style, and then with the binary convex structure given in Sect. 2. The relations we will explain in this section are depicted in Fig. 1.

Fig. 1.
figure 1

Relations between the various formalizations of convex spaces

The first equivalence, between multiary convex axioms, is rather technical. The first direction, proving Beaulieu’s axioms from the standard presentation (functor StandardToBeaulieu in  [18]), is relatively easy, as the partition law is intuitively just a special case of the barycenter law, where supportsFootnote 1 are disjoint, and the idempotence law can be derived as a combination of the two standard laws. However, the second direction (functor BeaulieuToStandard) is harder, and led us to introduce two derived laws:

  • Partition-barycenter law: barycenter law, with disjoint supports.

  • Injective map law: with u injective.

The partition-barycenter law can be derived from the Beaulieu style axioms, and in turn is used to prove the injective map law. Together they allow to prove the barycenter law.

The equivalence between binary and multiary axiomatizations requires first to define their operators in terms of each other.

Definition 4

(Convn and binconv  in [18])

  1. (a)

    Let \(d : I_n \rightarrow [0,1]\) be a finite distribution, and \(x : I_n \rightarrow X\) be points in a convex space X. Then the multiary convex combination of these points and distribution is defined from the binary operator by recursion on n as follows:

  2. (b)

    Let p be a probability and \(x_0,x_1\) be points in a convex space. Then their binary combination is defined from the multiary operator as follows:

The first direction, functor BinToNary in [18], must prove that the first definition satisfies the multiary axioms, and indeed amounts to proving a variant of Stone’s lemma. We will see in the next section that the original proof by Stone is better formalized by transporting the argument to conical spaces.

The opposite direction, functor NaryToBin, must prove the binary axioms from the multiary ones. While we start from the standard version, the idempotence law proved to be instrumental in this task, together with the following unrestricted map law.

  • Map law: for any map u. in [18])

Finally, one also needs to prove that the definitions we used for each operation in both directions are coherent.

Lemma 2

(equiv_conv  and equiv_convn  in  [18]). The constructions in Definition 4 (Convn and binconv) cancel each other. That is,

  • If is the operator induced by Definition 4(a), and \(\triangleleft {_{\_}} \triangleright ^\dagger \) the one induced from it by Definition 4(b), we can derive \( a \triangleleft {_{p}} \triangleright ^\dagger b = a \triangleleft {_{p}} \triangleright b \) from the binary axioms.

  • If \(\triangleleft {_{\_}} \triangleright ^*\) the operator induced by Definition 4(b), is the one induced from it by Definition 4(a), we can derive from the multiary axioms.

4 Conical Spaces and Embedded Convex Spaces

The definition of multiary convex combination operator in the previous section (Definition 4(a)) relied on recursion. This makes the definition look complicated, and moreover, the algebraic properties of the combination difficult to see. If we consider the special case of convex sets in a vector space, the meaning of multiary combinations and the algebraic properties become evident:

The additions on the right-hand side are of vectors, and thus are associative and commutative. This means that the multiary combination on the left-hand side is invariant under permutations or partitions on indices. We want to show that these invariance properties are also satisfied generally in any convex space.

However, the search for the proofs is painful if naively done. This is because binary convex combination operations satisfy associativity and commutativity only through cumbersome parameter computations. For example, a direct proof of the permutation case involves manipulations on the set \(I_n\) of indices and on the symmetry groups, which require fairly long combinatorics  [27, Lemma 2].

We present a solution to this complexity by transporting the arguments on convex spaces to a closely related construction of conical spaces. Conical spaces are an abstraction of cones in real vector spaces just like convex spaces are an abstraction of convex sets. Like convex spaces, the definition of conical spaces appears in many articles. We refer to the ones by Flood (called semicone there)  [13] and by Varacca and Winskel (called real cone there)  [31]:

Definition 5

(Conical space). A conical space is a semimodule over the semiring of non-negative reals. That is, it is a structure for the following signature:

  • Carrier set X.

  • Zero \(\mathbf{0} : X\).

  • Addition operation \(\_ + \_ : X \times X \rightarrow X\).

  • Scaling operations \(c \_ : X \rightarrow X\) indexed by \(c \in \mathbb R_{\ge 0}\).

  • Associativity law for addition: \(x + (y + z) = (x + y) + z\).

  • Commutativity law for addition: \(x + y = y + x\).

  • Associativity law for scaling: \(c (d x) = (c d) x\).

  • Left-distributivity law: \((c + d)x = cx + dx\).

  • Right-distributivity law: \(c(x + y) = cx + cy\).

  • Zero law for addition: \(\mathbf{0} + x = x\).

  • Left zero law for scaling: \(0 x = \mathbf{0}\).

  • Right zero law for scaling: \(c \mathbf{0} = \mathbf{0}\).

  • One law for scaling: \(1 x = x\).

We display this definition only to show that conical spaces have straightforward associativity and commutativity. In fact, the formalization is elaborated on the embedding of convex spaces into canonically constructed conical spaces, which appeared in the article by Flood  [13]. We build on top of each convex space X, the conical space \(S_X\) of its “scaled points”:

Definition 6

(scaled_pt, addpt, and scalept  in [18]). Let X be a convex space. We define a set \(S_X\) which becomes a conical space with the following addition and scaling operations.

$$ S_X := (\mathbb R_{>0} \times X) \cup \{\mathbf{0}\}. $$

That is, the points of \(S_X\) are either a pair \({p * x}\) of \(p\in \mathbb R_{>0}\) and \(x\in X\), or a new additive unit \(\mathbf{0}\). Addition of points \(a, b \in S_X\) is defined by cases to deal with \(\mathbf{0}\):

$$ a + b := {\left\{ \begin{array}{ll} {(r+q) * (x \triangleleft {_{r/(r+q)}} \triangleright y)} &{} \text {if }a={r * x} \text { and }b={q * y}\\ a &{} \text {if }b = \mathbf{0}\\ b &{} \text {if }a = \mathbf{0} \end{array}\right. } $$

Scaling \(a \in S_X\) by \(p \in \mathbb R_{\ge 0}\) is also defined by cases:

$$ pa := {\left\{ \begin{array}{ll} {pq * x} &{} \text {if }p > 0 \text { and }a = {q * x} \\ \mathbf{0} &{} \text {otherwise} \end{array}\right. } $$

We omit here the proofs that \(S_X\) with these addition and scaling satisfies the conical laws. They are proved formally in [18] (see the lemmas addptC, addptA, scalept_addpt, etc.).

Properties of the underlying convex spaces are transported into and back from this conical space, through an embedding:

Definition 7

(S1  in [18])

$$ \iota : \begin{array}[t]{ccc} X &{} \rightarrowtail &{} S_X\\ x &{} \mapsto &{} {1 * x} \end{array} $$

Convex combinations in X are mapped by \(\iota \) to additions in \(S_X\).

Lemma 3

(S1_convn  in [18])

The right-hand side of the lemma is a conical sum (Fig. 2), which behaves like an ordinary linear sum thanks to the conical laws, and enjoys good support from MathComp’s big operator library  [9].

Fig. 2.
figure 2

Example of S1_convn: \({1 * w} = {\frac{1}{2} * x} + {\frac{1}{4} * y} + {\frac{1}{4} * z}\)

With these preparations, properties such as  [27, Lemma 2] can be proved in a few lines of Coq code:

Lemma 4

(Convn_perm  in [18])

where s is any permutation on the set of indices n.

The proof of the barycenter property  [27, Lemma 4] from Sect. 3 is based on the same technique (see Convn_convnfdist in [18]).

A way to understand this conical approach is to start from Stone’s definition of convex spaces  [27]. He uses a quaternary convex operator \((x,y;\alpha ,\beta )\) where x and y are points of the space, and \(\alpha \) and \(\beta \) are non-negative coefficients such that \(\alpha +\beta >0\). Its values are quotiented by an axiom to be invariant under scaling, removing the need to normalize coefficients for associativity. This amounts to regarding a convex space as the projective space of some conical space.

The definition of \(S_X\) is a concrete reconstruction of such a conical space from a given convex space X. The benefit of this method over Stone’s is the removal of quotients by moving the coefficients from operations to values. We can then use the linear-algebraic properties of conical sums such as the neutrality of zeroes, which had to be specially handled in Stone’s proofs (e.g., [27, Lemma 2]).

Examples. We illustrate how \(\iota \) is used in practice with the proof of the entropic identity. Let T be a convType; we want to show that

$$\begin{aligned} (a \triangleleft {_{q}} \triangleright b) \triangleleft {_{p}} \triangleright (c \triangleleft {_{q}} \triangleright d) = (a \triangleleft {_{p}} \triangleright c) \triangleleft {_{q}} \triangleright (b \triangleleft {_{p}} \triangleright d). \end{aligned}$$
(1)

We could use the properties of convex spaces, but this will result in cumbersome computations, in particular because of quasi-associativity. Instead, we proceed by an embedding into the set of scaled points over T using \(\iota \). First, we observe that these scaled points form a convex space for the operator \(p, a, b \mapsto pa + \bar{p}b\) and that \(\iota (a \triangleleft {_{p}} \triangleright b) = \iota (a) \triangleleft {_{p}} \triangleright \iota (b)\). As a consequence, when we apply \(\iota \) to Equation (1), its left-hand side becomes

$$ p(q \iota (a) + \bar{q}\iota (b)) + \bar{p}(q\iota (c) + \bar{q}\iota (d)). $$

The main difference with Eq. (1) is that \(+\) (Coq notation: addpt) enjoys (unconditional) associativity, making the rest of the proof easier. In the proof script below, line 4 performs the embedding by first using the injectivity of \(\iota \) (lemma S1_inj), then using the fact that \(\iota \) is a morphism w.r.t. \(\_ \triangleleft {_{p}} \triangleright \_\) (lemma S1_conv), and last by revealing the definition of the operator of the convex spaces formed by scaled points (lemma convptE). The proof can be completed by rewritings with properties of addpt and scalept until the left-hand side matches the right-hand side.

figure i

We conclude this section with an example that provides a closed formula for the multiary convex combination (Coq notation: ) in the case of the real line (seen as a convex space):

figure k

This corresponds to the following transformations of the left-hand side.

5 Formalization of Convex Sets and Hulls

Our first application of convex and conical spaces is the formalization of convex sets and convex hulls. Besides mathematics, they also appear in many applications of convex spaces such as program semantics  [8, 11].

Definition 8

(is_convex_set  in [18]). Let T be a convex space. A subset D in T is a convex set if, for any \(p\in [0,1]\) and \(x, y\in D\), \(x \triangleleft {_{p}} \triangleright y \in D\).

We use the predicate is_convex_set to define the type of convex sets over T.

We can turn any set of points in a convex space into a convex set, namely, by taking convex hulls.

Definition 9

(hull  in [18]). For a subset X of T, its hull \(\overline{X}\) is

Example. The following example illustrates the usefulness of conical spaces when reasoning about convex hulls.

Our goal is to prove that for any \(z \in {\textsf {hull}\left( X\cup Y\right) }\) (\(X\ne \emptyset \), \(Y\ne \emptyset \)), there exist \(x \in X\) and \(y \in Y\) such that \(z = x \triangleleft {_{p}} \triangleright y\) for some p (see the formal statement at line 1 below).

We first introduce two notations. Let scaled_set X be the set \(\{ {p * x}\,|\,x\in \texttt {X}\}\). For any \(a\ne 0\), let (where a0 is the proof that \(a\ne 0\)) be the x such that \(a={p * x}\) for some p.

To prove our goal, it is sufficient to prove that there exist \(a \in \texttt {scaled\_set X}\) and \(b \in \texttt {scaled\_set Y}\) such that \(\iota (z) = a + b\) (this reasoning step is the purpose of line 6). When \(a=0\) or \(b=0\), we omit easy proofs at lines 8 and 9. Otherwise, we can take x to be and y to be as performed by the four lines from line 10.

We now establish the sufficient condition (from line 14). Since z is in the hull, we have a distribution d and n points \(g_i\) such that . We then decompose \(\iota (z)\) as follows:

$$\iota (z) = \sum _{i< n} d_i(\iota (g_i)) = \underbrace{\sum _{i< n,g_i\in X} d_i(\iota (g_i))}_b + \underbrace{\sum _{i < n, g_i\notin X} d_i(\iota (g_i))}_c.$$

We conclude by observing that b is in scaled_set X and that c is in scaled_set Y because \(\{g_i|g_i\notin X\} \subseteq Y\).

figure p

6 Formalization of Convex Functions

In this section, we first (Sect. 6.1) formalize a generic definition of convex functions based on convex spaces; for that purpose, we introduce in particular ordered convex spaces. To demonstrate this formalization, we then apply it to the proof of the concavity of the logarithm function and to an information-theoretic function (Sect. 6.2).

6.1 Ordered Convex Spaces and Convex Functions

An ordered convex space extends a convex space with a partial order structure:

Definition 10

( in [18]). An ordered convex space is a structure whose signature extends the one of convex spaces as follows:

  • Convex space X.

  • Ordering relation \((\_ \le \_) \subset X \times X\).

  • Reflexivity law: \(x \le x\).

  • Transitivity law: \(x \le y \wedge y \le z \Rightarrow x \le z\).

  • Antisymmetry law: \(x \le y \wedge y \le x \Rightarrow x = y\).

The above definition does not force any interaction between convexity and ordering. It would also be a natural design to include an axiom stating that convex combinations preserve ordering  [21, Sect. 2]. We however do not need such interactions for defining convex functions, which is our purpose here.

Convexity of a function is defined if its codomain is an ordered convex space. In the following, let T be a convex space and U be an ordered convex space.

Definition 11

(convex_function_at  in [18]). A function \(f : T \rightarrow U\) is convex at \(p\in [0,1]\) and \(x,y\in T\) if \(f(x \triangleleft {_{p}} \triangleright y) \le f(x) \triangleleft {_{p}} \triangleright f(y)\).

Definition 12

(convex_function  in [18]). A function \(f : T \rightarrow U\) is convex if it is convex at all \(p\in [0,1]\) and \(x,y\in T\).

The above predicates expect total functions. For partial functions, we resort to convex sets (Definition 8).

Definition 13

(convex_function_in  in [18]). Let D be a convex set in T. A function \(f : T \rightarrow U\) is convex in D if it is convex at any \(p\in [0,1]\) and \(x,y\in D\).

Concave functions are defined similarly since f is concave for the order \(\le \) if it is convex for \(\ge \). When the codomain of f is \(\mathbb {R}\), the prototypical example of an ordered convex space, it is also easy to prove that f is concave if \(-f\) is convex.

6.2 Examples of Convex Functions

As a first example, we prove that the real logarithm function is concave. The concavity of logarithm is frequently used in information theory, for example, properties of data compression depend on it  [4].

The definition of logarithm we use in Coq is the one of the standard library; it has the entire \(\mathbb R\) as its domain by setting \(\log (x)=0\) for \(x\le 0\). The statement of concavity is then restricted to the subset \(\mathbb R_{>0}\).Footnote 2

Lemma 5

(log_concave  in [17, probability/ln_facts.v]). The extended logarithm function

$$ x \mapsto {\left\{ \begin{array}{ll} \log (x) &{} \text {if }x\in \mathbb R_{>0}\\ 0 &{} \text {otherwise} \end{array}\right. } $$

is concave in \(\mathbb R_{>0}\).

The statement in Coq of these lemmas is as follows:

figure r

The predicate concave_function_in has been explained in Sect. 6.1. The object Rpos_interval is the set of positive numbers described as the predicate equipped with the proof that this set is indeed convex. The heart of the proof is the fact that a function whose second derivative is non-negative is convex ( in [18]). Our proof proceeds by using the formalization of real analysis from the Coq standard library; our formalization of convex spaces can thus be seen as an added abstraction layer of convexity to this library.

Our second example of convex function is the divergence (a.k.a. relative entropy or Kullback-Leibler divergence) of two probability distributions: an important information-theoretic function. Let P and Q be two finite distributions (over some finite type A). Their divergence div is defined as follows:

Actually, div P Q is defined only when Q dominates P, i.e., when implies for all a. We call such a pair of probability distributions a dominated pair. Hereafter, we denote div P Q by and the dominance of P by Q by .

We now show that the divergence function is convex over the set of dominated pairs. To formalize this statement using our definitions, we first need to show that dominated pairs form a convex space. To achieve this, it suffices to define the convex combination of the dominated pairs and as (where we use the convex combination of probability distributions). This operator is easily shown to enjoy the properties of convex spaces (Sect. 2). Once this is done, one just needs to uncurry the divergence function to use the convex_function predicate:

figure ac

The proof follows the standard one  [12, Theorem. 2.7.2] and relies on the log-sum inequality formalized in previous work  [6].

In previous work  [5], we applied above results to the proofs of convexity of other information-theoretic functions such as the entropy and the mutual information.

7 Related Work

Conical spaces have been known in the literature to work as a nice-behaving replacement of convex spaces when constructing models of nondeterministic computations. Varacca and Winskel  [31] used convexity when building a categorical monad combining probability and nondeterminism, but they chose to avoid the problem of equational laws in convex spaces by instead working with conical spaces. There is a similar preference in the study of domain-theoretic semantics of nondeterminism, to a conical structure (d-cones  [23]) over the corresponding convex structure (abstract probabilistic domain  [20]). The problem is the same in this case: the difficulty in working with the equational laws of convex spaces  [22, 30].

Flood  [13] proposed to use conical spaces to investigate the properties of convex spaces. He showed that for any convex space, there is an enveloping conical space and the convex space is embedded in it. (A version of the embedding for convex sets into cones in vector spaces was already present in Semadini’s book  [26].) Keimel and Plotkin  [21] extended the idea for their version of ordered convex spaces and applied it in the proof of their key lemma  [21, Lemma 2.8], which is an ordered version of the one proved by Neumann  [25, Lemma 2].

Another aspect of convex spaces is the relationship to probabilistic distributions. From any set, one can freely generate a convex space by formally taking all finite convex combinations of elements of this set. The resulting convex space can be seen as a set of distributions over the original set, since the formal convex combinations are equivalent to distributions over the given points. By this construction, convex spaces serve as a foundation for the algebraic and category-theoretic treatments of probability. This allows for another application of our work to the semantics of probabilistic and nondeterministic programming  [16, 19]. We have also been investigating this topic  [3, 7]. Our most recent result  [2] is based on the properties of convex sets and convex hulls, and deals with derived notions such as convex powersets. Its purpose is the formal study of program semantics from a category-theoretic point of view, rather than the formal study of the mathematical structure of convex spaces itself, which is rather the purpose of this paper.

8 Conclusion

In this paper, we formalized convex and conical spaces and developed their theories. In particular, we formally studied the various presentations of the convex combination operator, be it binary or multiary (Sect. 3). We provide formal proofs of the equivalence between several axiomatizations of both operators, where “proofs” in the literature were often only mere references to Stone’s foundational paper  [27], while it only contains a reduction of the multiary case to the binary one. Based on convex and conical spaces, we also developed a theory of convex functions and of convex hulls. We illustrated these developments with detailed examples from real analysis and information theory.