1 Introduction

Mixture experiments consist in varying the proportions of some components involved in a physico-chemical phenomenon, and observe the resulting change on the response. The proportions of the mixture components vary between 0 and 1 and they must sum to 1 for each run in the experiment. The experimental region is reduced to a (d-1)-dimensional simplex,

$${S}^{d-1}=\left\{\left({x}_{1},\dots ,{x}_{d}\right)|{x}_{1}+\dots +{x}_{d}=1,{x}_{k}\ge 0 \right\},$$

where \({x}_{k}\) is the proportion of the kth component, \(k=1,\dots ,d.\).

The purpose of design for mixture experiments is to define a set of points in the simplex to catch as much information about the response as possible. Since Scheffé (1958) many authors have investigated designs for mixture experiments. The pioneers (Scheffé 1958; Kiefer 1961; Cornell 1981), defined optimal designs for linear and quadratic mixture models. An alternative approach of model-free designs is proposed by Wang and Fang (1990) and Fang and Wang (1994). The goal is to uniformly cover the experimental region. The main idea is to generate a uniform design on the \((d-1)\) dimensional unit cube as explained in Hickernell (1998) or in Fang et al. (2005). Then they apply a mapping function to put the points in the simplex \({S}^{d-1}\). Following this principle, many articles suggested improvements specially to take into account complex constraints on the components, Fang and Yang (2000), Prescott (2008), Borkowski and Piepel (2009), Ning et al. (2011), and Liu and Liu (2016).

The former design in the unit cube is uniform in the sense that the points minimize a discrepancy criterion. The discrepancy measures the distance between the cumulative function of the uniform distribution and the empirical cumulative function of the design points. It is not guaranteed to conserve the uniformity after the mapping function. Some authors defined criteria to assess the uniformity of design for mixture experiments. Fang and Wang (1994) proposed to use the mean square distance (MSD), Borkowski and Piepel (2009) suggested the root mean squared distance, the maximum distance and the average distance, Chuang and Hung (2010) defined the central composite discrepancy. All these criteria require to compute the distance between the design points and the points of a much larger uniform set of points. The computational cost limits their usefulness in practice. To avoid this drawback, Ning et al. (2011) generalized the star discrepancy and proposed a new discrepancy, DM2 discrepancy, to measure the uniformity of designs for mixtures. They also gave a computational formula of the DM2 discrepancy only based on the design points, which is useful in practice, specially to use it in an optimization algorithm to build a uniform design for mixture experiments.

In the same way, we defined in this paper a new criterion to measure the distribution of the design points in the simplex \({S}^{d-1}\). The purpose is to obtain uniform designs, and more generally designs with a Dirichlet distribution. Depending on its parameters, the Dirichlet distribution allows to obtain symmetric and asymmetric distributions, designs with points uniformly spread in the simplex or more concentrated in the center. We used the Kullback–leibler (KL) divergence to measure the difference between the probability density function of the design point distribution and the probability density function of the Dirichlet distribution. The KL divergence has already been used to define space-filling criteria but for a hypercube experimental domain (Jourdan and Franco 2009, 2010). The target distribution was the uniform distribution on the unit hypercube and the criterion was reduced to the estimation of the Shannon entropy. In this paper, we adapt the criterion to the Dirichlet distribution. We propose two methods to estimate the KL divergence, a plug-in estimation and a nearest neighbor estimation. This leads to two criteria for assessing the distribution of the design points.

Applied with the flat Dirichlet distribution, the new criteria lead to designs with a uniform distribution of their points but they are not uniform designs in the sense defined by Fang and Wang (1994). The new criteria are based on the density probability function whereas the discrepancy for uniform designs is based on the cumulative distribution function.

In Sect. 2, we define the criterion from the Kullback–Leibler divergence and the Dirichlet distribution. In Sect. 3, we propose two methods to estimate the criterion. In Sect. 4, we carry out a numerical comparison between the new and existing criteria in the case of the uniform distribution. In Sect. 5, we propose two applications, one concerning simplex-lattice designs and the other on the marginal distribution of components.

2 Design points with a Dirichlet distribution

Suppose that the design points \({{\varvec{x}}}_{1},\dots ,{{\varvec{x}}}_{{\varvec{n}}}\), are \(n\) independent observations of the random vector \({\varvec{X}}=\left({X}_{1},\dots ,{X}_{d}\right)\) with absolutely continuous density function \(f\) concentrated on the simplex \({S}^{d-1}\). The aim is to select the design points in such a way as to have the corresponding empirical distribution “close” to the Dirichlet distribution.

Dirichlet distribution is a family of continuous multivariate probability distributions parameterized by a vector \(\boldsymbol{\alpha }\) of positive reals. The support of the Dirichlet distribution is the (d-1)-simplex \({S}^{d-1}\). Its probability density function is

$$ g\left( {\varvec{x}} \right) = \frac{1}{{B\left( {\varvec{\alpha}} \right)}}\mathop \prod \limits_{k = 1}^{d} \left( {x_{k} } \right)^{{\alpha_{k} - 1}} , $$
(1)

where \({\varvec{x}}\) belongs to the (d-1)-simplex \({S}^{d-1}\), \(\boldsymbol{\alpha }=({\alpha }_{1},\dots ,{\alpha }_{d})\) with \({\alpha }_{i}>0\), and \(B(\boldsymbol{\alpha })\) is the normalizing constant,

$$ B\left( {\varvec{\alpha}} \right) = \frac{{\mathop \prod \nolimits_{k = 1}^{d} \left( {\alpha_{k} } \right)}}{{\left( {\alpha_{0} } \right)}}. $$

with \({\alpha }_{0}=\sum_{k=1}^{d}{\alpha }_{k}\) and Γ the Gamma function.

Hereafter, we focus on the symmetric Dirichlet distribution, that is all of the elements making up the parameter vector \(\boldsymbol{\alpha }\) have the same value \(\alpha \), called the concentration parameter, and we suppose that \(\alpha \ge 1\). When \(\alpha =1\), the symmetric Dirichlet distribution is equivalent to a uniform distribution over the (\(d-1\))-simplex \({S}^{d-1}\). It is called the flat Dirichlet distribution.

The aim is to generate \(n\) points in the simplex with a distribution as close as possible of a Dirichlet distribution. On Fig. 1a (starting design), we can see that a simple random generation of the Dirichlet distribution is not efficient to obtain a good point distribution. The points do not uniformly cover the simplex: some points are very close to each other while some areas are empty.

Fig. 1
figure 1

Designs with for \(d=3\) and \(n=30\). The flat Dirichlet random sampling a is a simple random generation of 30 points with a Dirichlet distribution with α = 1. DM2, MSD, Ckern and Cnn designs bf are the resulting designs of the optimization algorithm

We defined a criterion to measure the “distance” between the point distribution and the Dirichlet distribution. The criterion is then used in an optimization algorithm to build a set of points with the expected distribution.

There are different ways to measure the difference between two distributions. In the case of uniform design, discrepancies are based on the cumulative distribution function (Fang et al. 2005). In this paper, we use the Kullback–Leibler divergence to evaluate the deviation between two probability density functions \(f\) and \(g\),

$$ I\left( {f,g} \right) = \int {f\left( {\varvec{x}} \right)log\left( {\frac{{f\left( {\varvec{x}} \right)}}{{g\left( {\varvec{x}} \right)}}} \right)d{\varvec{x}}} . $$

This integral can be written as the expected value of a random vector \({\varvec{X}}\) distributed according to \(f\),

$$ I\left( {f,g} \right) = E\left[ {log\left( {\frac{{f\left( {\varvec{X}} \right)}}{{g\left( {\varvec{X}} \right)}}} \right)} \right]. $$

We denote

$$ I\left( {f,g} \right) = I_{f} \left( f \right) - I_{f} \left( g \right) $$

where \({I}_{f}\left(f\right)=E\left[\text{log}(f({\varvec{X}})\right]\) and \({I}_{f}\left(g\right)=E\left[\text{log}(g({\varvec{X}})\right]\).

Theorem 1.

Let \(g\) be the probability density function of the Dirichlet distribution (1), then integral \({I}_{f}\left(g\right)\) exists.

The proof of this theorem is given in Appendix A.

Throughout we suppose that \(f\) is the unknown density function of the design points such that integral \({I}_{f}\left(f\right)\) exists. This assumption is feasible since the goal is to obtain a density function \(f\) close to the Dirichlet density function \(g\). Then we can use the Kullback–Leibler divergence to evaluate the deviation between the design points distribution and the Dirichlet distribution.

If we consider that the design points \(D=\left\{{{\varvec{x}}}_{1},\dots ,{{\varvec{x}}}_{{\varvec{n}}}\right\}\) are \(n\) i.i.d. realizations of the unknown distribution \(f\), the Monte Carlo method gives an unbiased and consistent estimator,

$$ \hat{I}\left( {f,g} \right) = \hat{I}_{f} \left( f \right) - \hat{I}_{f} \left( g \right) $$
(2)

where \({\widehat{I}}_{f}\left(f\right)=\frac{1}{n}\sum_{i=1}^{n}\text{log}\left(f({{\varvec{x}}}_{{\varvec{i}}})\right)\) and

$$ \hat{I}_{f} \left( g \right) = \frac{{\left( {\alpha - 1} \right)}}{n}\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{k = 1}^{d} \log \left( {x_{ik} } \right) + \log \left( {B\left( {\varvec{\alpha}} \right)} \right) $$

with \({x}_{ik}\ne 0\), the kth component of the ith design point, \(i=1,\dots ,n\) and \(k=1,\dots ,d\).

The estimator \({\widehat{I}}_{f}\left(f\right)\) is not a computational formula since the density function \(f\) is unknown. There are two common ways to estimate integral \(I(f,g)\): the plug-in estimate which consists in replacing the density function \(f\) by its kernel estimate, and the nearest-neighbor estimate. We detail the two approaches in the next section.

The two estimations are not unbiased. However, having a bias is not a problem in our application, if the bias is fixed for a given \(n\) and \(d\). The goal is not to obtain an accurate estimate of the integral but a criterion to compare two sets of points in the optimization algorithm. We say that a design \({D}_{1}\) is better than a design \({D}_{2}\) if

$$ \hat{I}\left( {f_{1} ,g} \right) \le \hat{I}\left( {f_{2} ,g} \right) $$

with \({f}_{1}\) and \({f}_{2}\) the density functions associated to \({D}_{1}\) and \({D}_{2}\) respectively.

The minimization algorithm is an adaptation of the exchange algorithm described in Jin et al. (2005)

figure a

3 Estimation of the criterion

In this section we propose two methods to estimate the unknown density function \(f\) in Eq. 2. In each case we explain our choices (kernel, bandwidth, k in the k-nearest neighbor distance) and we give a computational formula for the criterion.

3.1 Plug-in estimate

The unknown density function f is estimated with the design points \(D=\left\{{{\varvec{x}}}_{1},\dots ,{{\varvec{x}}}_{{\varvec{n}}}\right\}\) by a kernel method (Scott 1992)

$$ \hat{f}\left( {\varvec{x}} \right) = \frac{1}{{n\left| H \right|^{1/2} }}\mathop \sum \limits_{i = 1}^{n} K\left( {{\varvec{H}}^{ - 1/2} \left( {{\varvec{x}} - {\varvec{x}}_{{\varvec{i}}} } \right)} \right), $$

where \(K\) is a multivariate kernel and \({\varvec{H}}\) is the bandwidth matrix (symmetric and positive definite matrix). It is known that the shape of the kernel has a minor influence on the estimation (Silverman 1986). We use a multidimensional Gaussian kernel,

$$ K\left( {\varvec{Z}} \right) = \left( {2\pi } \right)^{ - d/2} e^{{ - \frac{1}{2}\left\| {\varvec{Z}} \right\|}} . $$

On the contrary, the choice of the bandwidth matrix has a great influence on the accuracy of the estimation. We use a diagonal matrix, \({\varvec{H}}={h}^{2}{I}_{d}\), where

$$ h = n^{{ - 1/\left( {d + 4} \right)}} \frac{1}{{\alpha_{0} }}\sqrt {\frac{{\alpha \left( {\alpha_{0} - \alpha } \right)}}{{\left( {\alpha_{0} + 1} \right)}}} . $$

This choice is motivated by Theorem 2.

Theorem 2.

We consider the estimator \({\widehat{I}}_{f}\left(\widehat{f}\right)\) of \({\widehat{I}}_{f}\left(f\right)\). Suppose that \(f\) has continuous first and second order derivatives and \(\int f\left({\varvec{x}}\right) log^{2} \left(f\left({\varvec{x}}\right)\right)d{\varvec{x}}\) exists then the bias is.

$$ E\left[ {\hat{I}_{f} \left( f \right) - \hat{I}_{f} \left( {\hat{f}} \right)} \right] = O\left( {n^{ - 1} h^{ - d} } \right) + O\left( {h^{2} } \right). $$

The proof of this theorem is given in Appendix B.

The bias depends on the sample size \(n\), the dimension \(d\), and the bandwidth \(h\). When constructing an optimal design, the size \(n\) and the dimension \(d\) are fixed. The bandwidth still needs to be fixed so that the bias does not vary during the optimization algorithm. Usually the bandwidth matrix is chosen to be proportional to the covariance matrix of the data. This solution implies that \({\varvec{H}}\) varies during the optimization algorithm. An idea to fix it, is to replace the covariance matrix of the data by the target covariance matrix, i.e. the covariance matrix of the Dirichlet distribution. Unfortunately, this matrix is singular. Then, even if the variables are correlated, we simplify the bandwidth matrix into a diagonal matrix with the Scott’s rule (1992), \({\varvec{H}}=diag\left({h}_{1}^{2},\dots ,{h}_{d}^{2}\right)\) with \({h}_{k}={n}^{-1/(d+4)}{\widehat{\sigma }}_{k}\), where \({\widehat{\sigma }}_{k}\) is the estimation of the standard deviation of the \(k\) th component. The estimate \({\widehat{\sigma }}_{k}\) depends on the design points, so \({h}_{k}\) and thus the bias varies from one iteration to another in the algorithm. In order to fix the bias, we will replace the estimate \({h}_{k}\) by a value independent of the design points. Since our goal is to get closer to a Dirichlet distribution, the most obvious value for \({\widehat{\sigma }}_{k}\) is the standard deviation of the target distribution,

$$ \hat{\sigma }_{k} = \frac{1}{{\alpha_{0} }}\sqrt {\frac{{\alpha_{k} \left( {\alpha_{0} - \alpha_{k} } \right)}}{{\left( {\alpha_{0} + 1} \right)}}} . $$

Finally, by removing the terms independent of the design points and with \({\alpha }_{k}=\alpha , k=1,\dots ,d\), we obtain a simplified criterion,

$$ C_{kern} \left( D \right) = \mathop \sum \limits_{i = 1}^{n} \left[ {\log \left( {\mathop \sum \limits_{j = 1}^{n} e^{{ - \frac{1}{2}\frac{{{\varvec{x}}_{{\varvec{j}}} - {\varvec{x}}_{{\varvec{i}}} }}{h}^{2} }} } \right)} \right] - \left( {\alpha - 1} \right)\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{k = 1}^{d} \log (x_{ik} ) $$
(3)

where \(h={n}^{-1/(d+4)}\frac{1}{d}\sqrt{\frac{d-1}{d\alpha +1}}\).

3.2 Nearest-neighbor estimate

Wang et al. (2006) and Leonenko et al. (2008) proposed to estimate the Kullback–Leibler divergence with the k-nearest neighbor density estimation.

Let \(\rho \left({\varvec{x}},{\varvec{y}}\right)\) denote the Euclidian distance between two points \({\varvec{x}}\) and \({\varvec{y}}\) of IRd. We note \({\rho }^{(1)}\left({\varvec{x}},S\right)\le {\rho }^{\left(2\right)}\left({\varvec{x}},S\right)\le \dots \le {\rho }^{(m)}\left({\varvec{x}},S\right)\), the ordered distances between \({\varvec{x}}\in \) IRd and \(S=\left\{{{\varvec{y}}}_{1},\dots ,{{\varvec{y}}}_{{\varvec{m}}}\right\}\) a set of points of IRd such that \({\varvec{x}}\notin S\). \({\rho }^{(k)}\left({\varvec{x}},S\right)\) is the k-nearest-neighbor distance from \({\varvec{x}}\) to points of \(S\). The previous authors demonstrated that the following estimate of \({I}_{f}(f)\) with the design points \(D=\left\{{{\varvec{x}}}_{1},\dots ,{{\varvec{x}}}_{{\varvec{n}}}\right\}\) is asymptotically unbiased and consistent,

$$ \hat{I}_{f} \left( {\hat{f}} \right) = - \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \log \left\{ {\left( {n - 1} \right)e^{ - \psi \left( k \right)} V_{d} \left( {\rho^{\left( k \right)} \left( {{\varvec{x}}_{{\varvec{i}}} ,D_{ - i} } \right)} \right)^{d} } \right\} $$

with \(\psi \) the digamma function, \({V}_{d}\) the volume of the unit ball in IRd and \({D}_{-i}=D\backslash \left\{{{\varvec{x}}}_{{\varvec{i}}}\right\}\). Note that in this expression, we suppose that \({{\varvec{x}}}_{{\varvec{i}}}\ne {{\varvec{x}}}_{{\varvec{j}}}\). The bias depends on \(n\), \(d\) and \(k\). We need to fix the value of \(k\) so that the bias does not vary during the optimization algorithm. Pronzato (2017) justified to restrict the estimation to \(k=1\).

By removing the terms independent of the design points, we obtain the following criterion for a symmetric Dirichlet distribution,

$$ C_{nn} \left( D \right) = - \mathop \sum \limits_{i = 1}^{n} \log \left\{ {\left( {\rho^{\left( 1 \right)} \left( {{\varvec{x}}_{{\varvec{i}}} ,D_{ - i} } \right)} \right)^{d} } \right\} - \left( {\alpha - 1} \right)\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{k = 1}^{d} \log (x_{ik} ) $$
(4)

Remark 1.

Note that criteria Ckern and Cnn are reduced to their first term for the flat Dirichlet (uniform) distribution (\(\alpha =1\)), which are estimations of the Shannon entropy of the random vector \({\varvec{X}}\) (except the coefficient \(1/n\)).

Remark 2.

As the points get closer to the edges of the simplex, the second term increases. This means that the criteria will favor points inside the simplex. The higher the α coefficient, the more the points will be in the center of the simplex, which respects the behavior of the Dirichlet distribution.

Remark 3.

Since the criteria are based on the Euclidian distance, they are invariant under permuting factors or runs, and invariant under rotation of the coordinates.

4 Numerical tests

There is no criterion in the literature (except Ckern and Cnn) to assess whether a sample follows a Dirichlet distribution in the general case. In the case of the flat distribution, one can use existing criteria defined to evaluate the uniform distribution of points in a simplex. Most of the criteria compute the distances between the design points and the points of a much larger number-theoretic set within the simplex. Among these distance-based criteria, we select the mean square distance (MSD) defined by Fang and Wang (1994),

$$ MSD\left( D \right) = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\mathop {\min }\limits_{1 \le j \le n} d^{2} \left( {x_{j} ,z_{i} } \right)} , $$

where \(d\) is the Euclidian distance between \({x}_{j}\) and \({z}_{i}\), and \({z}_{1}, \dots ,{z}_{N}\) are \(N\) points of a glp setsFootnote 1 given in Fang and Wang (1994) with \(N\)= 610, 597, 701, 1069 and \(2129\) for \(d= 3\) to \(7\) respectively, \(N\)= 3997 for \(d=8\) and \(9\), and \(N\)= 4661 for \(d=10\) (which is the smallest size found in Fang and Wang 1994). The computational cost of the distance-based criteria increases rapidly with the dimension \(d\) and the size \(n\) (see Sect. 4.1) and limits their usefulness in practice. As far as we know, only the DM2 criterion defined by Ning et al. (2011) does not involve calculation with a large set of points. It is an adaptation of the star discrepancy to the simplex and it is estimated with the design points only,

$$ \begin{aligned} DM2\left( D \right) & = \left( {\frac{{\sqrt d }}{{\left( {d - 1} \right)!}}} \right)^{{\frac{1}{2}}} \left\{ {c_{{n,s}} - \frac{{2\left( {d - 1} \right)!}}{n}\sum\limits_{{i = 1}}^{n} {\sum\limits_{{\left( {\tau _{2} , \ldots ,\tau _{d} } \right) \in \left\{ {0,1} \right\}^{{d - 1}} }} {a_{\tau } } } .} \right. \\ & \quad \times \left( {x_{{i1}} } \right)^{{2\left( {d - 1} \right) - \sum\limits_{{j = 2}}^{2} {\tau _{j} } }} .\prod\limits_{{j = 2}}^{d} {\left( {x_{{ij}} } \right)^{{\tau _{j} }} } + \frac{1}{{n^{2} }} \\ & \quad \times \left. {\sum\limits_{{i = 1,k = 1}}^{n} {\left( {\max \left( {1 - \sum\limits_{{j = 2}}^{d} {\max } \left( {z_{{ij}} ,z_{{kj}} } \right),0} \right)} \right)^{{d - 1}} } } \right\}^{{\frac{1}{2}}} \\ \end{aligned} $$

where \({c}_{n,s}={\left(\left(d-1\right)!\right)}^{3}{2}^{d-1}/(2\left(d-1\right)!{\prod }_{k=0}^{d-2}(2d+k-1))\) and \({a}_{\tau }=\left(d-1\right)!/\left(2\left(d-1\right)-{\sum }_{i=2}^{d}{\tau }_{i}\right)!\).

In this section we use the optimization algorithm given in Sect. 2 (with 1000 iterations) to build designs with the four criteria Ckern, Cnn, DM2 and MSD and for different values of \(d\), \(n\) and \(\alpha \). For each configuration, we built several designs to consider the randomness in the initialization of the algorithm. Table 1 shows the correlation between the four criteria computed with 1000 random designs with \(n=10d\). The correlation is fairly weak especially when the dimension increases. This means that the criteria do not operate in the same way to assess the uniform distribution of the points. In the following sections, we compare the performance and behavior of the criteria.

Table 1 Correlation between the four criteria computed with 1000 random designs with \(n=10d\)

4.1 Design comparison in the case of the flat distribution

In Fig. 1, we use the same starting design (Fig. 1a) in the exchange algorithm in order to visually compare the resulting designs for \(d=\) 3 and \(n=\) 30. The starting design is a random set of points according the flat Dirichlet distribution. We observe that some points are very close together, providing redundant information, while some areas in the simplex are not explored by the points. The designs obtained the criteria DM2, MSD, Ckern and Cnn with \(\alpha =1\) (Fig. 1b, c, d, g) explore more uniformly the experimental domain. Some points are still close together with the DM2 criterion but there is no more empty area. The point distribution of the criteria MSD and Ckern designs with \(\alpha =1\) (Fig. 1c, d) is very regular like a grid distribution. The Ckern criterion tends to push the points on the edges of the simplex. Figures 1e, f, h and i illustrate that as α increases, the points are more concentrated inside the simplex. In this case, the designs are not space-filling since they don’t explore the entire domain, but the points are well distributed, i.e. they're not too close together and they explore the concentrated area evenly.

This first visual comparison illustrates the fact that a simple random draw according to the flat Dirichlet distribution is not enough, and an optimization algorithm with an appropriate criterion is necessary to construct a design of experiments for mixtures with points evenly spread in the simplex. We have drawn some conclusions about the behavior of the criteria in dimension 3, but a visual comparison is not sufficient to draw conclusions in dimensions greater than 3. That is why we introduce a graphical tool (Fig. 2) in order to compare the inter-site distance of the design points in any dimension. For a design, we compute the nearest neighbor distance of each point. The x-axis is the average of the nearest neighbor distances of the design points (µ) and the y-axis is the standard deviation (σ). A good coverage of the experimental region is obtained by a design with points far from each other (high average) and close to a regular grid (small standard deviation) like a scrambled grid. Then the target area is at the bottom right of this graphic. In Fig. 2, we have designs in dimension \(d=5\) and size \(n=30\) (left), and dimension \(d=10\) and \(n=50\) (right). In both cases Ckern criterion gives the best results since the points are on average far from their nearest neighbor. The Cnn criterion is not as good in dimension 5, but gives almost the same results in dimension 10. The DM2 and MSD criteria have the same results than simple random designs, the average of the inter-site distance is smaller with a high standard deviation. This means that some points are close to each other and will provide redundant information (the red points will be explained in Sect. 5.2). The new criteria are better than the existing ones in terms of inter-site distance.

Fig. 2
figure 2

Average (x-axis) and standard deviation (y-axis) the of the nearest neighbor distances of the design points with \(\alpha =1\) for Ckern and Cnn criteria

The next comparison is about the computational time. The complexity is.

  • \(O(d\times {n}^{2})\) for Ckern and Cnn criteria,

  • \(O(\left(d-2\right)\times {n}^{2}+n\times {2}^{d-1}\times \left(d-2\right)+2(d-2))\) for DM2 criterion,

  • \(O(d\times n\times N)\) for MSD criterion.

Figure 3 gives the evolution of the complexity as a function of \(n\) for \(d=3\) and \(d=10\). The calculation time for the MSD criterion is high, due to the calculation of the distance between the design points and the points of a larger glp set. Time increases dramatically with size, even if we choose the smallest size for the glp set in Fang and Wang (1994). The DM2 criterion has the lowest complexity for \(d=3\) but the complexity becomes very high for \(d=10\). The significant cost of the two existing criteria explains why they are only five designs in Fig. 2 for \(d=10\).

Fig. 3
figure 3

Evolution of complexity as a function of size n

4.2 Design behavior according to α

Figure 4 represents the average of DM2 and MSD criterion values of 20 designs with \(d=3, n=30\) according to \(\alpha \). DM2 and MSD criteria measure the uniformity of the designs. They should therefore reach their minimum value for \(\alpha =1\), and this is the case for Cnn designs. However, the Ckern designs have a minimal value for \(\alpha =1.5\). This means that the Ckern criterion tends to push the points on the edges of the simplex, and that \(\alpha \) must be increased to bring them back to the center. This confirms our conclusion from the previous paragraph when visually comparing the 3-dimensional design in Fig. 1. We also note that there is less variability in criterion values for Ckern designs.

Fig. 4
figure 4

Average of the DM2 and MSD criterion values for the sampling of 20 designs with \(d=3\) and \(n=30\) (95% confidence interval in grey)

5 Applications

5.1 Concentrated design

An alternative to build a uniform design for mixture experiments is the contraction of a simplex-lattice (Scheffé 1958). The points of a simplex-lattice seem to be uniformly distributed on \({S}^{d-1}\) but most of them lie on the boundary (Fig. 5a) and some experiments are reduced one or two components in the mixture (e.g. the first experiment in the {3,3}-simplex lattice in Table 2 involves the first component \({X}_{1}\) only). Fang and Wang (1994) proposed to keep the simplex-lattice pattern while moving the points towards the centroid of the simplex. An example of a lattice-simplex and the contracted design is given in Table 2 and Fig. 5.

Fig. 5
figure 5

Simplex-lattice (left) and contracted designs with \(a=5\) (right) with \(d=3\) and \(n=10\)

Table 2 {3,3}-Simplex lattice design and its contracted design

The smaller the contraction constant a, the more the points are concentrated in the center. Fang and Wang (1994) and Ning et al. (2011) used MSD and DM2 criteria to find the best value of \(a\). In the same way, we optimize the Ckern and Cnn criteria to determine \(a\) (Fig. 6).

Fig. 6
figure 6

Ckern, Cnn, DM2 and MSD criteria against the contraction constant a. Best values of a are 4.9 with Ckern, 9.6 with Cnn, 5.3 with DM2 and 4.6 with MSD

The Cnn criterion is optimal for a high value of \(a\) (\(a=9.6\)). It means that Cnn criterion tends to push the points inside the simplex. The Ckern criterion find an optimal value very close to the values obtained by Fang and Wang (1994) and Ning et al. (2011) (\(a\cong 5\)).

5.2 The curse of dimensionality and marginal distribution

As the dimension \(d\) increases, some phenomena appear that cannot be ignored.

The first one is the prohibitive size of the {d,q} simplex-lattice when d increases, \(n=\left(d+q-1\right)!/\left(d-1\right)!q!\). Some examples are given in Table 3. In dimension 10, the simplex-lattice with \(q=2\) requires \(n=55\) experiments and tests only three levels \(\{{0,0.5,1}\}\) for each component. If we need to test more levels, \(\{{0,1}/{3,2}/{3,1}\}\) with \(q=3\) for example, the size increases to \(n=220\) experiments. The idea is to use the optimization algorithm with the previous criteria to select a well-distributed subset of points in the simplex-lattice. The red points in Fig. 2 are the designs obtained by this method with a simplex-lattice with \(q=4\). The inter-site distance of designs constructed with the MSD criterion increases considerably in dimension 5 and 10. In dimension 5, the DM2 and Ckern criteria perform less well than their original versions, with DM2 in particular showing strong variability in inter-site distance. In dimension 10, all criteria increase the inter-site distance, with still high variability for the DM2 criterion. There is no result with the Cnn. The nearest-neighbor distance is constant when we restrict the experimental domain to the points of the simplex-lattice, so the optimization algorithm does not converge.

Table 3 Example of sizes for a {d,q} simplex-lattice

The second phenomenon that arises as the dimension increases is that the optimization process tends to push the points to the edges of the experimental domain. This phenomenon is well known in the construction of space-filling designs in the unit cube. It is reinforced in the case of design for mixture experiments by the fact that the marginal distributions of the Dirichlet distribution are Beta distributions, \(Beta({\alpha }_{i},{\alpha }_{0}-{\alpha }_{i}).\) In the special case of the uniform distribution (\(\alpha =1\)), the distributions are \(Beta(1,d-1)\). As shown in Fig. 7, the skewness of the density function increases with dimension. When the dimension is large, small proportions are over-represented in the experimental design. To avoid this inconvenience, we can build designs with a Dirichlet distribution, which allows us to control the distribution of small proportions by choosing an appropriate \(\alpha \) value. For example, if \(X\) is a random variable with a Dirichlet distribution, then \(P(X<0.1)=20\%\) implies that \(\alpha =2.3\) for \(d=5\). Figure 8 provides the marginal distribution of \({X}_{1}\) for different designs with \(d=5\) and \(n=30\) build with the optimization algorithm. Note that with \(\alpha =2.3\), the frequency of component proportion below 0.1 decreases considerably. The objective is therefore achieved. On the other hand, increasing \(\alpha \) contracts the design symmetrically. As a result, the frequency of large proportions also decreases. There is no longer any proportion greater than 0.5.

Fig. 7
figure 7

Density function of Beta(1,d-1)

Fig. 8
figure 8

Marginal distribution of \({X}_{1}\) for designs with \(d=5\) and \(n=30\)

6 Conclusion

In this paper we have proposed two new criteria for evaluating the point distribution of designs for mixture experiments. The Dirichlet distribution allows to build design points with uniform or contracted distribution. The Kullback–Leibler divergence is used to measure the difference between the Dirichlet and design point distributions. We have used the plugin estimate with a Gaussian kernel and the nearest neighbor distance to estimate the Kullbeck-Leibler divergence. The two criteria are simplified to be used in an optimization process to build designs for mixture experiments with a target Dirichlet distribution.

Numerical tests in dimension 3 show that the criteria allow to evenly spread the points in the simplex as well as the existing criteria. Tests in higher dimensions show that the new criteria give better results. The distance between points is higher with the new criteria than with the existing DM2 and MSD criteria, and with lower variability. On the other hand, calculation time for the existing criteria increases considerably as the dimension increases. This makes them difficult to use for mixtures with many components. The new criteria therefore seem to be the best choice in this case.

We have proposed two applications in the high-dimensional case. The first comes from the observation that the number of points in a simplex-lattice becomes excessive as the dimension increases, especially if we wish to test many levels for each component. The criteria are then used to select a subset of simplex-lattice points that are well distributed over the experimental domain. The Ckern and MSD criteria give good results in terms of inter-site distance. The advantage of the Ckern criterion lies in its computational speed. The second application comes from the observation that low proportions are over-represented in the design, and that this phenomenon is amplified as the dimension increases. One of the advantages of the two criteria proposed in this paper is that they are based on Dirichlet distribution, which deals with more than just uniform distribution. We can determine the value of the \(\alpha \) parameter to control the frequency of small proportions in the experimental design. We can also set different values of \(\alpha \) depending on the component and obtain an asymmetrical design as shown in Fig. 9.

Fig. 9
figure 9

Ckern and Cnn designs with d = 3 and n = 10 for an asymmetric Dirichlet distribution with α = (2,4,8). The starting design points are n i.i.d random generation of the Dirichlet distribution

The second application is not entirely satisfactory, because by reducing the frequency of small proportions, we also reduce the large proportions that are already under-represented. Having a uniform distribution on the simplex Sd−1 and a symmetric distribution on each axis seems to be two conflicting objectives. A multi-objective optimization algorithm (instead of the exchange algorithm) would allow to manage this problem. The first objective function would be one of the two criteria defined in this paper. The second objective function could be defined in order to measure the difference between the distribution of each component and a univariate symmetric distribution with support [0,1] (e.g. symmetric triangular or truncated normal distribution). As we did in this paper, the Kullback–Leibler divergence and its estimates could be used to define a criterion for the second objective function. The Pareto front could be used to find the best compromise between the two objectives.