Space-filling designs with a Dirichlet distribution for mixture experiments

Jourdan, Astrid

doi:10.1007/s00362-023-01493-2

Space-filling designs with a Dirichlet distribution for mixture experiments

Regular Article
Published: 07 October 2023

Volume 65, pages 2667–2686, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Statistical Papers Aims and scope Submit manuscript

Space-filling designs with a Dirichlet distribution for mixture experiments

Download PDF

Astrid Jourdan ORCID: orcid.org/0000-0003-3879-4216¹

204 Accesses
Explore all metrics

Abstract

Uniform designs are widely used for experiments with mixtures. The uniformity of the design points is usually evaluated with a discrepancy criterion. In this paper, we propose a new criterion to measure the deviation between the design point distribution and a Dirichlet distribution. The support of the Dirichlet distribution, is defined by the set of d-dimensional vectors whose entries are real numbers in the interval [0,1] such that the sum of the coordinates is equal to 1. This support is suitable for mixture experiments. Depending on its parameters, the Dirichlet distribution allows symmetric or asymmetric, uniform or more concentrated point distribution. The difference between the empirical and the target distributions is evaluated with the Kullback–Leibler divergence. We use two methods to estimate the divergence: the plug-in estimate and the nearest-neighbor estimate. The resulting two criteria are used to build space-filling designs for mixture experiments. In the particular case of the flat Dirichlet distribution, both criteria lead to uniform designs. They are compared to existing uniformity criteria. The advantage of the new criteria is that they allow other distributions than uniformity and they are fast to compute.

Mixture experiments in the interior: Yantram designs

Article 01 December 2015

Uniform mixture designs using designs in 2-dimensional spherical region

Article 13 July 2023

The mixture design threshold accepting algorithm for generating $\varvec{D}$-optimal designs of the mixture models

Article 15 July 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Mixture experiments consist in varying the proportions of some components involved in a physico-chemical phenomenon, and observe the resulting change on the response. The proportions of the mixture components vary between 0 and 1 and they must sum to 1 for each run in the experiment. The experimental region is reduced to a (d-1)-dimensional simplex,

$${S}^{d-1}=\left\{\left({x}_{1},\dots ,{x}_{d}\right)|{x}_{1}+\dots +{x}_{d}=1,{x}_{k}\ge 0 \right\},$$

where ${x}_{k}$ is the proportion of the kth component, $k=1,\dots ,d.$.

The purpose of design for mixture experiments is to define a set of points in the simplex to catch as much information about the response as possible. Since Scheffé (1958) many authors have investigated designs for mixture experiments. The pioneers (Scheffé 1958; Kiefer 1961; Cornell 1981), defined optimal designs for linear and quadratic mixture models. An alternative approach of model-free designs is proposed by Wang and Fang (1990) and Fang and Wang (1994). The goal is to uniformly cover the experimental region. The main idea is to generate a uniform design on the $(d-1)$ dimensional unit cube as explained in Hickernell (1998) or in Fang et al. (2005). Then they apply a mapping function to put the points in the simplex ${S}^{d-1}$. Following this principle, many articles suggested improvements specially to take into account complex constraints on the components, Fang and Yang (2000), Prescott (2008), Borkowski and Piepel (2009), Ning et al. (2011), and Liu and Liu (2016).

The former design in the unit cube is uniform in the sense that the points minimize a discrepancy criterion. The discrepancy measures the distance between the cumulative function of the uniform distribution and the empirical cumulative function of the design points. It is not guaranteed to conserve the uniformity after the mapping function. Some authors defined criteria to assess the uniformity of design for mixture experiments. Fang and Wang (1994) proposed to use the mean square distance (MSD), Borkowski and Piepel (2009) suggested the root mean squared distance, the maximum distance and the average distance, Chuang and Hung (2010) defined the central composite discrepancy. All these criteria require to compute the distance between the design points and the points of a much larger uniform set of points. The computational cost limits their usefulness in practice. To avoid this drawback, Ning et al. (2011) generalized the star discrepancy and proposed a new discrepancy, DM2 discrepancy, to measure the uniformity of designs for mixtures. They also gave a computational formula of the DM2 discrepancy only based on the design points, which is useful in practice, specially to use it in an optimization algorithm to build a uniform design for mixture experiments.

In the same way, we defined in this paper a new criterion to measure the distribution of the design points in the simplex ${S}^{d-1}$. The purpose is to obtain uniform designs, and more generally designs with a Dirichlet distribution. Depending on its parameters, the Dirichlet distribution allows to obtain symmetric and asymmetric distributions, designs with points uniformly spread in the simplex or more concentrated in the center. We used the Kullback–leibler (KL) divergence to measure the difference between the probability density function of the design point distribution and the probability density function of the Dirichlet distribution. The KL divergence has already been used to define space-filling criteria but for a hypercube experimental domain (Jourdan and Franco 2009, 2010). The target distribution was the uniform distribution on the unit hypercube and the criterion was reduced to the estimation of the Shannon entropy. In this paper, we adapt the criterion to the Dirichlet distribution. We propose two methods to estimate the KL divergence, a plug-in estimation and a nearest neighbor estimation. This leads to two criteria for assessing the distribution of the design points.

Applied with the flat Dirichlet distribution, the new criteria lead to designs with a uniform distribution of their points but they are not uniform designs in the sense defined by Fang and Wang (1994). The new criteria are based on the density probability function whereas the discrepancy for uniform designs is based on the cumulative distribution function.

In Sect. 2, we define the criterion from the Kullback–Leibler divergence and the Dirichlet distribution. In Sect. 3, we propose two methods to estimate the criterion. In Sect. 4, we carry out a numerical comparison between the new and existing criteria in the case of the uniform distribution. In Sect. 5, we propose two applications, one concerning simplex-lattice designs and the other on the marginal distribution of components.

2 Design points with a Dirichlet distribution

Suppose that the design points ${{\varvec{x}}}_{1},\dots ,{{\varvec{x}}}_{{\varvec{n}}}$, are $n$ independent observations of the random vector ${\varvec{X}}=\left({X}_{1},\dots ,{X}_{d}\right)$ with absolutely continuous density function $f$ concentrated on the simplex ${S}^{d-1}$. The aim is to select the design points in such a way as to have the corresponding empirical distribution “close” to the Dirichlet distribution.

Dirichlet distribution is a family of continuous multivariate probability distributions parameterized by a vector $\boldsymbol{\alpha }$ of positive reals. The support of the Dirichlet distribution is the (d-1)-simplex ${S}^{d-1}$. Its probability density function is

$$ g\left( {\varvec{x}} \right) = \frac{1}{{B\left( {\varvec{\alpha}} \right)}}\mathop \prod \limits_{k = 1}^{d} \left( {x_{k} } \right)^{{\alpha_{k} - 1}} , $$

(1)

where ${\varvec{x}}$ belongs to the (d-1)-simplex ${S}^{d-1}$, $\boldsymbol{\alpha }=({\alpha }_{1},\dots ,{\alpha }_{d})$ with ${\alpha }_{i}>0$, and $B(\boldsymbol{\alpha })$ is the normalizing constant,

$$ B\left( {\varvec{\alpha}} \right) = \frac{{\mathop \prod \nolimits_{k = 1}^{d} \left( {\alpha_{k} } \right)}}{{\left( {\alpha_{0} } \right)}}. $$

with ${\alpha }_{0}=\sum_{k=1}^{d}{\alpha }_{k}$ and Γ the Gamma function.

Hereafter, we focus on the symmetric Dirichlet distribution, that is all of the elements making up the parameter vector $\boldsymbol{\alpha }$ have the same value $\alpha $, called the concentration parameter, and we suppose that $\alpha \ge 1$. When $\alpha =1$, the symmetric Dirichlet distribution is equivalent to a uniform distribution over the ($d-1$)-simplex ${S}^{d-1}$. It is called the flat Dirichlet distribution.

The aim is to generate $n$ points in the simplex with a distribution as close as possible of a Dirichlet distribution. On Fig. 1a (starting design), we can see that a simple random generation of the Dirichlet distribution is not efficient to obtain a good point distribution. The points do not uniformly cover the simplex: some points are very close to each other while some areas are empty.

We defined a criterion to measure the “distance” between the point distribution and the Dirichlet distribution. The criterion is then used in an optimization algorithm to build a set of points with the expected distribution.

There are different ways to measure the difference between two distributions. In the case of uniform design, discrepancies are based on the cumulative distribution function (Fang et al. 2005). In this paper, we use the Kullback–Leibler divergence to evaluate the deviation between two probability density functions $f$ and $g$,

$$ I\left( {f,g} \right) = \int {f\left( {\varvec{x}} \right)log\left( {\frac{{f\left( {\varvec{x}} \right)}}{{g\left( {\varvec{x}} \right)}}} \right)d{\varvec{x}}} . $$

This integral can be written as the expected value of a random vector ${\varvec{X}}$ distributed according to $f$,

$$ I\left( {f,g} \right) = E\left[ {log\left( {\frac{{f\left( {\varvec{X}} \right)}}{{g\left( {\varvec{X}} \right)}}} \right)} \right]. $$

We denote

$$ I\left( {f,g} \right) = I_{f} \left( f \right) - I_{f} \left( g \right) $$

where ${I}_{f}\left(f\right)=E\left[\text{log}(f({\varvec{X}})\right]$ and ${I}_{f}\left(g\right)=E\left[\text{log}(g({\varvec{X}})\right]$.

Theorem 1.

Let $g$ be the probability density function of the Dirichlet distribution (1), then integral ${I}_{f}\left(g\right)$ exists.

The proof of this theorem is given in Appendix A.

Throughout we suppose that $f$ is the unknown density function of the design points such that integral ${I}_{f}\left(f\right)$ exists. This assumption is feasible since the goal is to obtain a density function $f$ close to the Dirichlet density function $g$. Then we can use the Kullback–Leibler divergence to evaluate the deviation between the design points distribution and the Dirichlet distribution.

If we consider that the design points $D=\left\{{{\varvec{x}}}_{1},\dots ,{{\varvec{x}}}_{{\varvec{n}}}\right\}$ are $n$ i.i.d. realizations of the unknown distribution $f$, the Monte Carlo method gives an unbiased and consistent estimator,

$$ \hat{I}\left( {f,g} \right) = \hat{I}_{f} \left( f \right) - \hat{I}_{f} \left( g \right) $$

(2)

where ${\widehat{I}}_{f}\left(f\right)=\frac{1}{n}\sum_{i=1}^{n}\text{log}\left(f({{\varvec{x}}}_{{\varvec{i}}})\right)$ and

$$ \hat{I}_{f} \left( g \right) = \frac{{\left( {\alpha - 1} \right)}}{n}\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{k = 1}^{d} \log \left( {x_{ik} } \right) + \log \left( {B\left( {\varvec{\alpha}} \right)} \right) $$

with ${x}_{ik}\ne 0$_, the kth component of the ith design point, $i=1,\dots ,n$ and $k=1,\dots ,d$.

The estimator ${\widehat{I}}_{f}\left(f\right)$ is not a computational formula since the density function $f$ is unknown. There are two common ways to estimate integral $I(f,g)$: the plug-in estimate which consists in replacing the density function $f$ by its kernel estimate, and the nearest-neighbor estimate. We detail the two approaches in the next section.

The two estimations are not unbiased. However, having a bias is not a problem in our application, if the bias is fixed for a given $n$ and $d$. The goal is not to obtain an accurate estimate of the integral but a criterion to compare two sets of points in the optimization algorithm. We say that a design ${D}_{1}$ is better than a design ${D}_{2}$ if

$$ \hat{I}\left( {f_{1} ,g} \right) \le \hat{I}\left( {f_{2} ,g} \right) $$

with ${f}_{1}$ and ${f}_{2}$ the density functions associated to ${D}_{1}$ and ${D}_{2}$ respectively.

The minimization algorithm is an adaptation of the exchange algorithm described in Jin et al. (2005)

3 Estimation of the criterion

In this section we propose two methods to estimate the unknown density function $f$ in Eq. 2. In each case we explain our choices (kernel, bandwidth, k in the k-nearest neighbor distance) and we give a computational formula for the criterion.

3.1 Plug-in estimate

The unknown density function f is estimated with the design points $D=\left\{{{\varvec{x}}}_{1},\dots ,{{\varvec{x}}}_{{\varvec{n}}}\right\}$ by a kernel method (Scott 1992)

$$ \hat{f}\left( {\varvec{x}} \right) = \frac{1}{{n\left| H \right|^{1/2} }}\mathop \sum \limits_{i = 1}^{n} K\left( {{\varvec{H}}^{ - 1/2} \left( {{\varvec{x}} - {\varvec{x}}_{{\varvec{i}}} } \right)} \right), $$

where $K$ is a multivariate kernel and ${\varvec{H}}$ is the bandwidth matrix (symmetric and positive definite matrix). It is known that the shape of the kernel has a minor influence on the estimation (Silverman 1986). We use a multidimensional Gaussian kernel,

$$ K\left( {\varvec{Z}} \right) = \left( {2\pi } \right)^{ - d/2} e^{{ - \frac{1}{2}\left\| {\varvec{Z}} \right\|}} . $$

On the contrary, the choice of the bandwidth matrix has a great influence on the accuracy of the estimation. We use a diagonal matrix, ${\varvec{H}}={h}^{2}{I}_{d}$, where

$$ h = n^{{ - 1/\left( {d + 4} \right)}} \frac{1}{{\alpha_{0} }}\sqrt {\frac{{\alpha \left( {\alpha_{0} - \alpha } \right)}}{{\left( {\alpha_{0} + 1} \right)}}} . $$

This choice is motivated by Theorem 2.

Theorem 2.

We consider the estimator ${\widehat{I}}_{f}\left(\widehat{f}\right)$ of ${\widehat{I}}_{f}\left(f\right)$. Suppose that $f$ has continuous first and second order derivatives and $\int f\left({\varvec{x}}\right) log^{2} \left(f\left({\varvec{x}}\right)\right)d{\varvec{x}}$ exists then the bias is.

$$ E\left[ {\hat{I}_{f} \left( f \right) - \hat{I}_{f} \left( {\hat{f}} \right)} \right] = O\left( {n^{ - 1} h^{ - d} } \right) + O\left( {h^{2} } \right). $$

The proof of this theorem is given in Appendix B.

The bias depends on the sample size $n$, the dimension $d$, and the bandwidth $h$. When constructing an optimal design, the size $n$ and the dimension $d$ are fixed. The bandwidth still needs to be fixed so that the bias does not vary during the optimization algorithm. Usually the bandwidth matrix is chosen to be proportional to the covariance matrix of the data. This solution implies that ${\varvec{H}}$ varies during the optimization algorithm. An idea to fix it, is to replace the covariance matrix of the data by the target covariance matrix, i.e. the covariance matrix of the Dirichlet distribution. Unfortunately, this matrix is singular. Then, even if the variables are correlated, we simplify the bandwidth matrix into a diagonal matrix with the Scott’s rule (1992), ${\varvec{H}}=diag\left({h}_{1}^{2},\dots ,{h}_{d}^{2}\right)$ with ${h}_{k}={n}^{-1/(d+4)}{\widehat{\sigma }}_{k}$, where ${\widehat{\sigma }}_{k}$ is the estimation of the standard deviation of the $k$ ^th component. The estimate ${\widehat{\sigma }}_{k}$ depends on the design points, so ${h}_{k}$ and thus the bias varies from one iteration to another in the algorithm. In order to fix the bias, we will replace the estimate ${h}_{k}$ by a value independent of the design points. Since our goal is to get closer to a Dirichlet distribution, the most obvious value for ${\widehat{\sigma }}_{k}$ is the standard deviation of the target distribution,

$$ \hat{\sigma }_{k} = \frac{1}{{\alpha_{0} }}\sqrt {\frac{{\alpha_{k} \left( {\alpha_{0} - \alpha_{k} } \right)}}{{\left( {\alpha_{0} + 1} \right)}}} . $$

Finally, by removing the terms independent of the design points and with ${\alpha }_{k}=\alpha , k=1,\dots ,d$, we obtain a simplified criterion,

$$ C_{kern} \left( D \right) = \mathop \sum \limits_{i = 1}^{n} \left[ {\log \left( {\mathop \sum \limits_{j = 1}^{n} e^{{ - \frac{1}{2}\frac{{{\varvec{x}}_{{\varvec{j}}} - {\varvec{x}}_{{\varvec{i}}} }}{h}^{2} }} } \right)} \right] - \left( {\alpha - 1} \right)\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{k = 1}^{d} \log (x_{ik} ) $$

(3)

where $h={n}^{-1/(d+4)}\frac{1}{d}\sqrt{\frac{d-1}{d\alpha +1}}$.

3.2 Nearest-neighbor estimate

Wang et al. (2006) and Leonenko et al. (2008) proposed to estimate the Kullback–Leibler divergence with the k-nearest neighbor density estimation.

Let $\rho \left({\varvec{x}},{\varvec{y}}\right)$ denote the Euclidian distance between two points ${\varvec{x}}$ and ${\varvec{y}}$ of IR^d. We note ${\rho }^{(1)}\left({\varvec{x}},S\right)\le {\rho }^{\left(2\right)}\left({\varvec{x}},S\right)\le \dots \le {\rho }^{(m)}\left({\varvec{x}},S\right)$, the ordered distances between ${\varvec{x}}\in $ IR^d and $S=\left\{{{\varvec{y}}}_{1},\dots ,{{\varvec{y}}}_{{\varvec{m}}}\right\}$ a set of points of IR^d such that ${\varvec{x}}\notin S$. ${\rho }^{(k)}\left({\varvec{x}},S\right)$ is the k-nearest-neighbor distance from ${\varvec{x}}$ to points of $S$. The previous authors demonstrated that the following estimate of ${I}_{f}(f)$ with the design points $D=\left\{{{\varvec{x}}}_{1},\dots ,{{\varvec{x}}}_{{\varvec{n}}}\right\}$ is asymptotically unbiased and consistent,

$$ \hat{I}_{f} \left( {\hat{f}} \right) = - \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \log \left\{ {\left( {n - 1} \right)e^{ - \psi \left( k \right)} V_{d} \left( {\rho^{\left( k \right)} \left( {{\varvec{x}}_{{\varvec{i}}} ,D_{ - i} } \right)} \right)^{d} } \right\} $$

with $\psi $ the digamma function, ${V}_{d}$ the volume of the unit ball in IR^d and ${D}_{-i}=D\backslash \left\{{{\varvec{x}}}_{{\varvec{i}}}\right\}$. Note that in this expression, we suppose that ${{\varvec{x}}}_{{\varvec{i}}}\ne {{\varvec{x}}}_{{\varvec{j}}}$. The bias depends on $n$, $d$ and $k$. We need to fix the value of $k$ so that the bias does not vary during the optimization algorithm. Pronzato (2017) justified to restrict the estimation to $k=1$.

By removing the terms independent of the design points, we obtain the following criterion for a symmetric Dirichlet distribution,

$$ C_{nn} \left( D \right) = - \mathop \sum \limits_{i = 1}^{n} \log \left\{ {\left( {\rho^{\left( 1 \right)} \left( {{\varvec{x}}_{{\varvec{i}}} ,D_{ - i} } \right)} \right)^{d} } \right\} - \left( {\alpha - 1} \right)\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{k = 1}^{d} \log (x_{ik} ) $$

(4)

Remark 1.

Note that criteria C_kern and C_nn are reduced to their first term for the flat Dirichlet (uniform) distribution ($\alpha =1$), which are estimations of the Shannon entropy of the random vector ${\varvec{X}}$ (except the coefficient $1/n$).

Remark 2.

As the points get closer to the edges of the simplex, the second term increases. This means that the criteria will favor points inside the simplex. The higher the α coefficient, the more the points will be in the center of the simplex, which respects the behavior of the Dirichlet distribution.

Remark 3.

Since the criteria are based on the Euclidian distance, they are invariant under permuting factors or runs, and invariant under rotation of the coordinates.

4 Numerical tests

There is no criterion in the literature (except C_kern and C_nn) to assess whether a sample follows a Dirichlet distribution in the general case. In the case of the flat distribution, one can use existing criteria defined to evaluate the uniform distribution of points in a simplex. Most of the criteria compute the distances between the design points and the points of a much larger number-theoretic set within the simplex. Among these distance-based criteria, we select the mean square distance (MSD) defined by Fang and Wang (1994),

$$ MSD\left( D \right) = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\mathop {\min }\limits_{1 \le j \le n} d^{2} \left( {x_{j} ,z_{i} } \right)} , $$

where $d$ is the Euclidian distance between ${x}_{j}$ and ${z}_{i}$, and ${z}_{1}, \dots ,{z}_{N}$ are $N$ points of a glp sets^{Footnote 1} given in Fang and Wang (1994) with $N$= 610, 597, 701, 1069 and $2129$ for $d= 3$ to $7$ respectively, $N$= 3997 for $d=8$ and $9$, and $N$= 4661 for $d=10$ (which is the smallest size found in Fang and Wang 1994). The computational cost of the distance-based criteria increases rapidly with the dimension $d$ and the size $n$ (see Sect. 4.1) and limits their usefulness in practice. As far as we know, only the DM2 criterion defined by Ning et al. (2011) does not involve calculation with a large set of points. It is an adaptation of the star discrepancy to the simplex and it is estimated with the design points only,

$$ \begin{aligned} DM2\left( D \right) & = \left( {\frac{{\sqrt d }}{{\left( {d - 1} \right)!}}} \right)^{{\frac{1}{2}}} \left\{ {c_{{n,s}} - \frac{{2\left( {d - 1} \right)!}}{n}\sum\limits_{{i = 1}}^{n} {\sum\limits_{{\left( {\tau _{2} , \ldots ,\tau _{d} } \right) \in \left\{ {0,1} \right\}^{{d - 1}} }} {a_{\tau } } } .} \right. \\ & \quad \times \left( {x_{{i1}} } \right)^{{2\left( {d - 1} \right) - \sum\limits_{{j = 2}}^{2} {\tau _{j} } }} .\prod\limits_{{j = 2}}^{d} {\left( {x_{{ij}} } \right)^{{\tau _{j} }} } + \frac{1}{{n^{2} }} \\ & \quad \times \left. {\sum\limits_{{i = 1,k = 1}}^{n} {\left( {\max \left( {1 - \sum\limits_{{j = 2}}^{d} {\max } \left( {z_{{ij}} ,z_{{kj}} } \right),0} \right)} \right)^{{d - 1}} } } \right\}^{{\frac{1}{2}}} \\ \end{aligned} $$

where ${c}_{n,s}={\left(\left(d-1\right)!\right)}^{3}{2}^{d-1}/(2\left(d-1\right)!{\prod }_{k=0}^{d-2}(2d+k-1))$ and ${a}_{\tau }=\left(d-1\right)!/\left(2\left(d-1\right)-{\sum }_{i=2}^{d}{\tau }_{i}\right)!$.

In this section we use the optimization algorithm given in Sect. 2 (with 1000 iterations) to build designs with the four criteria C_kern, C_nn, DM2 and MSD and for different values of $d$, $n$ and $\alpha $. For each configuration, we built several designs to consider the randomness in the initialization of the algorithm. Table 1 shows the correlation between the four criteria computed with 1000 random designs with $n=10d$. The correlation is fairly weak especially when the dimension increases. This means that the criteria do not operate in the same way to assess the uniform distribution of the points. In the following sections, we compare the performance and behavior of the criteria.

Table 1 Correlation between the four criteria computed with 1000 random designs with $n=10d$

Full size table

4.1 Design comparison in the case of the flat distribution

In Fig. 1, we use the same starting design (Fig. 1a) in the exchange algorithm in order to visually compare the resulting designs for $d=$ 3 and $n=$ 30. The starting design is a random set of points according the flat Dirichlet distribution. We observe that some points are very close together, providing redundant information, while some areas in the simplex are not explored by the points. The designs obtained the criteria DM2, MSD, C_kern and C_nn with $\alpha =1$ (Fig. 1b, c, d, g) explore more uniformly the experimental domain. Some points are still close together with the DM2 criterion but there is no more empty area. The point distribution of the criteria MSD and C_kern designs with $\alpha =1$ (Fig. 1c, d) is very regular like a grid distribution. The C_kern criterion tends to push the points on the edges of the simplex. Figures 1e, f, h and i illustrate that as α increases, the points are more concentrated inside the simplex. In this case, the designs are not space-filling since they don’t explore the entire domain, but the points are well distributed, i.e. they're not too close together and they explore the concentrated area evenly.

This first visual comparison illustrates the fact that a simple random draw according to the flat Dirichlet distribution is not enough, and an optimization algorithm with an appropriate criterion is necessary to construct a design of experiments for mixtures with points evenly spread in the simplex. We have drawn some conclusions about the behavior of the criteria in dimension 3, but a visual comparison is not sufficient to draw conclusions in dimensions greater than 3. That is why we introduce a graphical tool (Fig. 2) in order to compare the inter-site distance of the design points in any dimension. For a design, we compute the nearest neighbor distance of each point. The x-axis is the average of the nearest neighbor distances of the design points (µ) and the y-axis is the standard deviation (σ). A good coverage of the experimental region is obtained by a design with points far from each other (high average) and close to a regular grid (small standard deviation) like a scrambled grid. Then the target area is at the bottom right of this graphic. In Fig. 2, we have designs in dimension $d=5$ and size $n=30$ (left), and dimension $d=10$ and $n=50$ (right). In both cases C_kern criterion gives the best results since the points are on average far from their nearest neighbor. The C_nn criterion is not as good in dimension 5, but gives almost the same results in dimension 10. The DM2 and MSD criteria have the same results than simple random designs, the average of the inter-site distance is smaller with a high standard deviation. This means that some points are close to each other and will provide redundant information (the red points will be explained in Sect. 5.2). The new criteria are better than the existing ones in terms of inter-site distance.

The next comparison is about the computational time. The complexity is.

$O(d\times {n}^{2})$ for C_kern and C_nn criteria,
$O(\left(d-2\right)\times {n}^{2}+n\times {2}^{d-1}\times \left(d-2\right)+2(d-2))$ for DM2 criterion,
$O(d\times n\times N)$ for MSD criterion.

Figure 3 gives the evolution of the complexity as a function of $n$ for $d=3$ and $d=10$. The calculation time for the MSD criterion is high, due to the calculation of the distance between the design points and the points of a larger glp set. Time increases dramatically with size, even if we choose the smallest size for the glp set in Fang and Wang (1994). The DM2 criterion has the lowest complexity for $d=3$ but the complexity becomes very high for $d=10$. The significant cost of the two existing criteria explains why they are only five designs in Fig. 2 for $d=10$.

4.2 Design behavior according to α

Figure 4 represents the average of DM2 and MSD criterion values of 20 designs with $d=3, n=30$ according to $\alpha $. DM2 and MSD criteria measure the uniformity of the designs. They should therefore reach their minimum value for $\alpha =1$, and this is the case for C_nn designs. However, the C_kern designs have a minimal value for $\alpha =1.5$. This means that the C_kern criterion tends to push the points on the edges of the simplex, and that $\alpha $ must be increased to bring them back to the center. This confirms our conclusion from the previous paragraph when visually comparing the 3-dimensional design in Fig. 1. We also note that there is less variability in criterion values for C_kern designs.

5 Applications

5.1 Concentrated design

An alternative to build a uniform design for mixture experiments is the contraction of a simplex-lattice (Scheffé 1958). The points of a simplex-lattice seem to be uniformly distributed on ${S}^{d-1}$ but most of them lie on the boundary (Fig. 5a) and some experiments are reduced one or two components in the mixture (e.g. the first experiment in the {3,3}-simplex lattice in Table 2 involves the first component ${X}_{1}$ only). Fang and Wang (1994) proposed to keep the simplex-lattice pattern while moving the points towards the centroid of the simplex. An example of a lattice-simplex and the contracted design is given in Table 2 and Fig. 5.

Table 2 {3,3}-Simplex lattice design and its contracted design

Full size table

The smaller the contraction constant a, the more the points are concentrated in the center. Fang and Wang (1994) and Ning et al. (2011) used MSD and DM2 criteria to find the best value of $a$. In the same way, we optimize the C_kern and C_nn criteria to determine $a$ (Fig. 6).

The C_nn criterion is optimal for a high value of $a$ ($a=9.6$). It means that C_nn criterion tends to push the points inside the simplex. The C_kern criterion find an optimal value very close to the values obtained by Fang and Wang (1994) and Ning et al. (2011) ($a\cong 5$).

5.2 The curse of dimensionality and marginal distribution

As the dimension $d$ increases, some phenomena appear that cannot be ignored.

The first one is the prohibitive size of the {d,q} simplex-lattice when d increases, $n=\left(d+q-1\right)!/\left(d-1\right)!q!$. Some examples are given in Table 3. In dimension 10, the simplex-lattice with $q=2$ requires $n=55$ experiments and tests only three levels $\{{0,0.5,1}\}$ for each component. If we need to test more levels, $\{{0,1}/{3,2}/{3,1}\}$ with $q=3$ for example, the size increases to $n=220$ experiments. The idea is to use the optimization algorithm with the previous criteria to select a well-distributed subset of points in the simplex-lattice. The red points in Fig. 2 are the designs obtained by this method with a simplex-lattice with $q=4$. The inter-site distance of designs constructed with the MSD criterion increases considerably in dimension 5 and 10. In dimension 5, the DM2 and C_kern criteria perform less well than their original versions, with DM2 in particular showing strong variability in inter-site distance. In dimension 10, all criteria increase the inter-site distance, with still high variability for the DM2 criterion. There is no result with the C_nn. The nearest-neighbor distance is constant when we restrict the experimental domain to the points of the simplex-lattice, so the optimization algorithm does not converge.

Table 3 Example of sizes for a {d,q} simplex-lattice

Full size table

The second phenomenon that arises as the dimension increases is that the optimization process tends to push the points to the edges of the experimental domain. This phenomenon is well known in the construction of space-filling designs in the unit cube. It is reinforced in the case of design for mixture experiments by the fact that the marginal distributions of the Dirichlet distribution are Beta distributions, $Beta({\alpha }_{i},{\alpha }_{0}-{\alpha }_{i}).$ In the special case of the uniform distribution ($\alpha =1$), the distributions are $Beta(1,d-1)$. As shown in Fig. 7, the skewness of the density function increases with dimension. When the dimension is large, small proportions are over-represented in the experimental design. To avoid this inconvenience, we can build designs with a Dirichlet distribution, which allows us to control the distribution of small proportions by choosing an appropriate $\alpha $ value. For example, if $X$ is a random variable with a Dirichlet distribution, then $P(X<0.1)=20\%$ implies that $\alpha =2.3$ for $d=5$. Figure 8 provides the marginal distribution of ${X}_{1}$ for different designs with $d=5$ and $n=30$ build with the optimization algorithm. Note that with $\alpha =2.3$, the frequency of component proportion below 0.1 decreases considerably. The objective is therefore achieved. On the other hand, increasing $\alpha $ contracts the design symmetrically. As a result, the frequency of large proportions also decreases. There is no longer any proportion greater than 0.5.

6 Conclusion

In this paper we have proposed two new criteria for evaluating the point distribution of designs for mixture experiments. The Dirichlet distribution allows to build design points with uniform or contracted distribution. The Kullback–Leibler divergence is used to measure the difference between the Dirichlet and design point distributions. We have used the plugin estimate with a Gaussian kernel and the nearest neighbor distance to estimate the Kullbeck-Leibler divergence. The two criteria are simplified to be used in an optimization process to build designs for mixture experiments with a target Dirichlet distribution.

Numerical tests in dimension 3 show that the criteria allow to evenly spread the points in the simplex as well as the existing criteria. Tests in higher dimensions show that the new criteria give better results. The distance between points is higher with the new criteria than with the existing DM2 and MSD criteria, and with lower variability. On the other hand, calculation time for the existing criteria increases considerably as the dimension increases. This makes them difficult to use for mixtures with many components. The new criteria therefore seem to be the best choice in this case.

We have proposed two applications in the high-dimensional case. The first comes from the observation that the number of points in a simplex-lattice becomes excessive as the dimension increases, especially if we wish to test many levels for each component. The criteria are then used to select a subset of simplex-lattice points that are well distributed over the experimental domain. The C_kern and MSD criteria give good results in terms of inter-site distance. The advantage of the C_kern criterion lies in its computational speed. The second application comes from the observation that low proportions are over-represented in the design, and that this phenomenon is amplified as the dimension increases. One of the advantages of the two criteria proposed in this paper is that they are based on Dirichlet distribution, which deals with more than just uniform distribution. We can determine the value of the $\alpha $ parameter to control the frequency of small proportions in the experimental design. We can also set different values of $\alpha $ depending on the component and obtain an asymmetrical design as shown in Fig. 9.

The second application is not entirely satisfactory, because by reducing the frequency of small proportions, we also reduce the large proportions that are already under-represented. Having a uniform distribution on the simplex S^d−1 and a symmetric distribution on each axis seems to be two conflicting objectives. A multi-objective optimization algorithm (instead of the exchange algorithm) would allow to manage this problem. The first objective function would be one of the two criteria defined in this paper. The second objective function could be defined in order to measure the difference between the distribution of each component and a univariate symmetric distribution with support [0,1] (e.g. symmetric triangular or truncated normal distribution). As we did in this paper, the Kullback–Leibler divergence and its estimates could be used to define a criterion for the second objective function. The Pareto front could be used to find the best compromise between the two objectives.

Notes

Good Lattice Point sets.

References

Borkowski JJ, Piepel GF (2009) Uniform designs for highly constrained mixture experiments. J Qual Technol. https://doi.org/10.1080/00224065.2009.11917758
Article MATH Google Scholar
Chuang SC, Hung YC (2010) Uniform design over general input domains with applications to target region estimation in computer experiments. Comput Stat Data Anal. https://doi.org/10.1016/j.csda.2009.08.008
Article MathSciNet MATH Google Scholar
Cornell JA (1981) Experiments with mixtures, designs, models, and the analysis of mixture data. Wiley, New York
MATH Google Scholar
Fang KT, Wang Y (1994) Number-theoretic methods in statistics. Chapman & Hall, London. https://doi.org/10.1007/978-1-4899-3095-8
Article MATH Google Scholar
Fang KT, Li R, Sudjianto A (2005) Design modeling for computer experiments. Chapman & Hall, London. https://doi.org/10.1201/9781420034899
Article MATH Google Scholar
Hickernell FJ (1998) A generalized discrepancy and quadrature error bound. Math Comput. https://doi.org/10.1090/S0025-5718-98-00894-1
Article MathSciNet MATH Google Scholar
Jin R, Chen W, Sudjianto A (2005) An efficient algorithm for constructing optimal design of computer experiments. J Stat Plan Inference. https://doi.org/10.1016/j.jspi.2004.02.014
Article MathSciNet MATH Google Scholar
Joe H (1989) Estimation of entropy and other functional of multivariate density. Ann Inst Stat Math. https://doi.org/10.1007/BF00057735
Article MathSciNet MATH Google Scholar
Jourdan A, Franco J (2009) Plans d’expériences numériques d’information de Kullback-Leibler minimale. J Soc Fr Stat 150(2):52–64
MATH Google Scholar
Jourdan A, Franco J (2010) Optimal Latin hypercube designs for the Kullback-Leibler criterion. Adv Stat Anal. https://doi.org/10.1007/s10182-010-0145-y
Article MATH Google Scholar
Kiefer J (1961) Optimum designs for regression model, II. Ann Math Stat. https://doi.org/10.1214/aoms/1177705160
Book MATH Google Scholar
Leonenko N, Pronzato L, Savani V (2008) A class of Rényi information estimators for multidimensional densities. Ann Stat. https://doi.org/10.1214/07-AOS539
Article MATH Google Scholar
Liu Y, Liu M (2016) Construction of uniform designs for mixture experiments with complex constraints. Commun Stat. https://doi.org/10.1080/03610926.2013.875576
Article MATH Google Scholar
Ning JH, Zhou YD, Fang KT (2011) Discrepancy for uniform design of experiments with mixtures. J Stat Plan Inference. https://doi.org/10.1016/j.jspi.2010.10.015
Article MathSciNet MATH Google Scholar
Prescott P (2008) Nearly uniform designs for mixture experiments. Commun Stat. https://doi.org/10.1080/03610920701824257
Article MATH Google Scholar
Pronzato L (2017) Minimax and maximin space-filling designs: some properties and methods for construction. J Soc Fr Stat 158(1):7–36
MathSciNet MATH Google Scholar
Scheffé H (1958) Experiments with mixtures. J R Stat Soc Ser B. https://doi.org/10.1111/j.2517-6161.1958.tb00299.x
Article MathSciNet MATH Google Scholar
Scott DW (1992) Multivariate density estimation: theory, practice and visualization. Wiley, New York. https://doi.org/10.1002/9780470316849
Article MATH Google Scholar
Wang Y, Fang KT (1990) Number theoretic methods in applied statistics (II). Chin Ann Math. https://doi.org/10.1142/9789812701190_0039
Article MATH Google Scholar
Wang Q, Kulkarni R, Verdu S (2006) A nearest-neighbor approach to estimating divergence between continuous random vectors. In: 2006 IEEE international symposium on information theory. https://doi.org/10.1109/ISIT.2006.261842

Download references

Author information

Authors and Affiliations

ETIS UMR 8051, CY Paris University, 95000, Cergy, France
Astrid Jourdan

Authors

Astrid Jourdan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Astrid Jourdan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

1.1 Appendix A. The proof of theorem 1

We apply the Jensen’s inequality to the expected value ${I}_{f}\left(g\right)=E\left[\text{log}(g({\varvec{X}})\right]$.

Let denote the function $\varphi \left({\varvec{x}}\right)=\text{log}\left(g\left({\varvec{x}}\right)\right),$

$$ \varphi \left( {\varvec{x}} \right) = \log \left( {\frac{1}{{B\left( {\varvec{\alpha}} \right)}}\mathop \prod \limits_{k = 1}^{d} \left( {x_{k} } \right)^{{\alpha_{k} - 1}} } \right) = \mathop \sum \limits_{k = 1}^{d} \left( {\alpha_{k} - 1} \right){\text{log}}(x_{k} ) - \log \left( {B\left( \alpha \right)} \right). $$

$\varphi $ is a concave function since the logarithmic function is concave and $\left({\alpha }_{k}-1\right)$ is positive for ${\alpha }_{k}\ge 1$. The Jensen’s inequality implies that, $E\left[{\varphi }({\varvec{X}})\right]\le \varphi \left(E[{\varvec{X}}]\right)$. Let $E\left[{\varvec{X}}\right]=\left({\mu }_{1},\dots ,{\mu }_{d}\right)$, then

$$ \varphi \left( {E\left[ {\varvec{X}} \right]} \right) = \mathop \sum \limits_{k = 1}^{d} \left( {\alpha_{k} - 1} \right){\text{log}}(\mu_{k} ) - \log \left( {B\left( \alpha \right)} \right) < \infty $$

since $0<{\mu }_{k}<1$ ($supp\left({X}_{k}\right)=\left[{0,1}\right]$ and we exclude the special case of a constant random variable equals to 0).

1.2 Appendix B. The proof of theorem 2

The prof of Theorem 2 is a direct application of a result demonstrated by Joe (1989). We just have to verify the assumptions.

The choice of a Gaussian kernel satisfies the conditions.

$K(-z)=K(z)$
The kernel is of the form $K\left(z\right)=K\left({z}_{1},\dots ,{z}_{d}\right)=\prod_{j=1}^{d}{K}_{0}({z}_{j})$ where ${K}_{0}$ is a symmetric univariate density satisfying $\int {u}^{2}{K}_{0}\left(u\right)du=1.$

The $d$ components of ${\varvec{X}}$ have approximately the same scale in [0,1], the logarithmic function is thrice differentiable. Moreover, we suppose that $\int f\left({\varvec{x}}\right)log\left(f\left({\varvec{x}}\right)\right)d{\varvec{x}}$ and $\int f\left({\varvec{x}}\right) log^{2} \left(f\left({\varvec{x}}\right)\right)d{\varvec{x}}$ exists. Hence all conditions are satisfied to apply the results demonstrated by Joe (1989).

We have already noted that the existence hypothesis of $\int f\left({\varvec{x}}\right)log\left(f\left({\varvec{x}}\right)\right)d{\varvec{x}}$ is feasible since $f$ is close to $g$, and we proved the existence of this integral in Theorem 1. However, we have not demonstrated the existence of $\int f\left({\varvec{x}}\right) log^{2} \left(f\left({\varvec{x}}\right)\right)d{\varvec{x}}$ when $f=g$. This is demonstrated below in the case of $d=2$ to simplify notation. It remains true in the general case.

$$ I = \smallint f\left( {\varvec{x}} \right)log^{2} \left( {f\left( {\varvec{x}} \right)} \right)d{\varvec{x}} = \mathop \smallint \limits_{{S^{1} }}^{{}} f\left( {x_{1} ,x_{2} } \right)log^{2} \left( {f\left( {x_{1} ,x_{2} } \right)} \right)dx_{1} dx_{2} . $$

The line ${x}_{1}+{x}_{2}=1$ has the parametric representation,

$$ \left\{ {\begin{array}{*{20}c} {x_{1} \left( t \right) = - t} \\ {x_{2} \left( t \right) = 1 + t} \\ \end{array} } \right. $$

where $t\in \left[-{1,0}\right].$ We define the mapping function$M :\left[-{1,0}\right]\to \left[{0,1}\right]\times \left[{0,1}\right]$,$M\left(t\right)=(-t,1+t)$. Then,

$$ \begin{aligned} I & = \mathop \smallint \limits_{{ - 1}}^{0} f\left( { - t,1 + t} \right)log^{2} \left( {f\left( { - t,1 + t} \right)} \right)\left( {x^{\prime}_{1} \left( t \right),x^{\prime}_{2} \left( t \right)} \right)dt \\ & = \sqrt 2 \mathop \smallint \limits_{{ - 1}}^{0} f\left( { - t,1 + t} \right)log^{2} \left( {f\left( { - t,1 + t} \right)} \right)dt \\ \end{aligned} $$

If $f=g$,

$$ I = \frac{\sqrt 2 }{{B\left( \alpha \right)}}\mathop \smallint \limits_{ - 1}^{0} \left( { - t} \right)^{{\alpha_{1} - 1}} \left( {1 + t} \right)^{{\alpha_{2} - 1}} log^{2} \left( {\frac{1}{B\left( \alpha \right)}\left( { - t} \right)^{{\alpha_{1} - 1}} \left( {1 + t} \right)^{{\alpha_{2} - 1}} } \right)dt = I_{1} + I_{2} + I_{3} $$

With

$$ I_{1} = \frac{\sqrt 2 }{{B\left( \alpha \right)}}\mathop \smallint \limits_{ - 1}^{0} \left( { - t} \right)^{{\alpha_{1} - 1}} \left( {1 + t} \right)^{{\alpha_{2} - 1}} log^{2} \left( {\frac{1}{B\left( \alpha \right)}} \right)dt $$

$$ I_{2} = \left( {\alpha_{1} - 1} \right)\frac{\sqrt 2 }{{B\left( \alpha \right)}}\mathop \smallint \limits_{ - 1}^{0} \left( { - t} \right)^{{\alpha_{1} - 1}} \left( {1 + t} \right)^{{\alpha_{2} - 1}} log^{2} \left( { - t} \right)dt $$

$$ I_{3} = \left( {\alpha_{2} - 1} \right)\frac{\sqrt 2 }{{B\left( \alpha \right)}}\mathop \smallint \limits_{ - 1}^{0} \left( { - t} \right)^{{\alpha_{1} - 1}} \left( {1 + t} \right)^{{\alpha_{2} - 1}} log^{2} \left( {1 + t} \right)dt $$

${I}_{1}<+\infty $ since ${\alpha }_{i}\ge 1$.

${I}_{2}$ is an improper integral in 0, but ${\left(-t\right)}^{{\alpha }_{1}-1}{\left(1+t\right)}^{{\alpha }_{2}-1} log^{2} \left(-t\right)\sim {\left(-t\right)}^{{\alpha }_{1}-1} log^{2} \left(-t\right)$ when $t$ tends to 0 and ${\int }_{-1}^{0}{\left(-t\right)}^{{\alpha }_{1}-1} log^{2} \left(-t\right)dt$ is a convergent Bertrand’s integral.

${I}_{3}$ is an improper integral in -1, but ${\left(-t\right)}^{{\alpha }_{1}-1}{\left(1+t\right)}^{{\alpha }_{2}-1} log^{2} \left(1+t\right)\sim {\left(1+t\right)}^{{\alpha }_{2}-1} log^{2} \left(1+t\right)$ when $t$ tends to -1 and ${\int }_{-1}^{0}{\left(1+t\right)}^{{\alpha }_{2}-1} log^{2} \left(1+t\right)dt$ is a convergent Bertrand’s integral.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jourdan, A. Space-filling designs with a Dirichlet distribution for mixture experiments. Stat Papers 65, 2667–2686 (2024). https://doi.org/10.1007/s00362-023-01493-2

Download citation

Received: 19 October 2022
Revised: 30 June 2023
Accepted: 29 August 2023
Published: 07 October 2023
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00362-023-01493-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Space-filling designs with a Dirichlet distribution for mixture experiments

Abstract

Similar content being viewed by others

Mixture experiments in the interior: Yantram designs

Uniform mixture designs using designs in 2-dimensional spherical region

The mixture design threshold accepting algorithm for generating \(\varvec{D}\)-optimal designs of the mixture models

1 Introduction