1 Introduction

Our research is motivated by an experiment on the emergence of house flies for studying biological controls of disease-transmitting fly species (Itepan 1995; Zocchi and Atkinson 1999). In the original experiment (Itepan 1995), \(n=3500\) pupae were grouped evenly into seven subsets and exposed in a radiation device tuned to seven different gamma radiation levels \(x_i \in \{80, 100, 120, 140, 160, 180, 200\}\) in units Gy, respectively. After a certain period of time, each pupa had one of three possible outcomes: unopened, opened but died (before completing emergence), or completed emergence. The total experimental costs in time and expense were closely related to the number of distinct settings of the radiation device. By searching the grid-1 settings in [80, 200] using their lift-one algorithm, Bu et al. (2020) proposed a design on five settings \(\{80, 122, 123, 157, 158\}\) and improved the relative efficiency of the original design by \(20.8\%\) in terms of D-criterion. More recently, Ai et al. (2023) obtained a design focusing on four settings \(\{0, 101.1, 147.8, 149.3\}\) by employing an algorithm that combines the Fedorov-Wynn (Fedorov and Leonov 2014) and lift-one (Bu et al. 2020) algorithms on searching the continuous region [0, 200]. Having noticed that both Bu et al. (2020)’s and Ai et al. (2023)’s designs contain pairs of settings that are close to each other, we propose a new algorithm, called the ForLion algorithm (see Sect. 2), that incorporates a merging step to combine close experimental settings while maintaining high relative efficiency. For this case, our proposed algorithm identifies a D-optimal design on \(\{0, 103.56, 149.26\}\), which may lead to a \(40\%\) or \(25\%\) reduction of the experimental cost compared to Bu et al. (2020)’s or Ai et al. (2023)’s design, respectively.

In this paper, we consider experimental plans under fairly general statistical models with mixed factors. The pre-determined design region \(\mathcal{X} \subset {\mathbb {R}}^d\) with \(d\ge 1\) factors is compact, that is, bounded and closed, for typical applications (see Sect. 2.4 in Fedorov and Leonov (2014)). For many applications, \({{\mathcal {X}}} = \prod _{j=1}^d I_j\), where \(I_j\) is either a finite set of levels for a qualitative or discrete factor, or an interval \([a_j, b_j]\) for a quantitative or continuous factor. To simplify the notations, we assume that the first k factors are continuous, where \(0\le k\le d\), and the last \(d-k\) factors are discrete. Suppose \(m\ge 2\) distinct experimental settings \({{\textbf{x}}}_1, \ldots , {{\textbf{x}}}_m \in \mathcal{X}\), known as the design points, are chosen, and \(n > 0\) experimental units are available for the experiment with \(n_i \ge 0\) subjects allocated to \({\textbf{x}}_i\), such that, \(n=\sum _{i=1}^m n_i\) . We assume that the responses, which could be vectors, are independent and follow a parametric model \(M({{\textbf{x}}}_i; \varvec{\theta })\) with some unknown parameter(s) \(\varvec{\theta }\in {\mathbb {R}}^p\), \(p\ge 1\). In the design theory, \({{\textbf{w}}} = (w_1, \ldots , w_m)^T = (n_1/n, \ldots , n_m/n)^T\), known as the approximate allocation, is often considered instead of the exact allocation \({\textbf{n}} = (n_1, \ldots , n_m)^T\) (see, for examples, Kiefer (1974), Sect. 1.27 in Pukelsheim (2006), and Sect. 9.1 in Atkinson et al. (2007)). Under regularity conditions, the corresponding Fisher information matrix is \({{\textbf{F}}} = \sum _{i=1}^m w_i{\textbf{F}}_{{{\textbf{x}}}_i} \in {\mathbb {R}}^{p\times p}\) up to a constant n, where \({{\textbf{F}}}_{{{\textbf{x}}}_i}\) is the Fisher information at \({{\textbf{x}}}_i\) . In this paper, the design under consideration takes the form of \(\varvec{\xi }= \{({{\textbf{x}}}_i, w_i), i=1, \ldots , m\}\), where m is a flexible positive integer, \({\textbf{x}}_1, \ldots , {{\textbf{x}}}_m\) are distinct design points from \(\mathcal{X}\), \(0\le w_i \le 1\), and \(\sum _{i=1}^m w_i = 1\). We also denote \(\varvec{\Xi }= \{\{({{\textbf{x}}}_i, w_i), i=1, \ldots , m\} \mid m\ge 1; {{\textbf{x}}}_i \in \mathcal{X}, 0\le w_i\le 1, i=1, \ldots , m; \sum _{i=1}^m w_i = 1\}\) be the collection of all feasible designs.

Under different criteria, such as D-, A-, or E-criterion (see, for example, Chapter 10 in Atkinson et al. (2007)), many numerical algorithms have been proposed for finding optimal designs. If all factors are discrete, the design region \({{\mathcal {X}}}\) typically contains a finite number of design points, still denoted by m. Then the design problem is to optimize the approximate allocation \({{\textbf{w}}} = (w_1, \ldots , w_m)^T\). Commonly used design algorithms include Fedorov-Wynn (Fedorov 1972; Fedorov and Hackl 1997), multiplicative (Titterington 1976, 1978; Silvey et al. 1978), cocktail (Yu 2011), and lift-one (Yang and Mandal 2015; Yang et al. 2017; Bu et al. 2020), etc. Besides, classical optimization techniques such as Nelder-Mead (Nelder and Mead 1965), quasi-Newton (Broyden 1965; Dennis and Moré 1977), conjugate gradient (Hestenes and Stiefel 1952; Fletcher and Reeves 1964), and simulated annealing (Kirkpatrick et al. 1983) may also be used for the same purpose (Nocedal and Wright 2006). A comprehensive numerical study by Yang et al. (2016) (Table 2) showed that the lift-one algorithm outperforms commonly used optimization algorithms in identifying optimal designs, resulting in designs with fewer points.

Furthermore, many deterministic optimization methods may also be used for finding optimal designs under similar circumstances. Among them, polynomial time (P-time) methods including linear programming (Harman and Jurík 2008), second-order cone programming (Sagnol 2011), semidefinite programming (Duarte et al. 2018; Duarte and Wong 2015; Venables and Ripley 2002; Wong and Zhou 2023; Ye and Zhou 2013), mixed integer linear programming (Vo-Thanh et al. 2018), mixed integer quadratic programming (Harman and Filová 2014), mixed integer second-order cone programming (Sagnol and Harman 2015), and mixed integer semidefinite programming (Duarte 2023), are advantageous for discrete grids due to their polynomial time complexity and capability of managing millions of constraints efficiently. Notably, nonlinear polynomial time (NP-time) methods, such as nonlinear programming (Duarte et al. 2022), semi-infinite programming (Duarte and Wong 2014), and mixed integer nonlinear programming (Duarte et al. 2020) have been utilized as well.

When the factors are continuous, the Fedorov-Wynn algorithm can still be used by adding a new design point in each iteration, which maximizes a sensitivity function on \({{\mathcal {X}}}\) (Fedorov and Leonov 2014). To improve the efficiency, Ai et al. (2023) proposed a new algorithm for D-optimal designs under a continuation-ratio link model with continuous factors, which essentially incorporates the Fedorov-Wynn (for adding new design points) and lift-one (for optimizing the approximate allocation) algorithms. Nevertheless, the Fedorov-Wynn step tends to add unnecessary closely-spaced design points (see Sect. 3.2), which may increase the experimental cost. An alternative approach is to discretize the continuous factors and consider only the grid points (Yang et al. 2013), which may be computationally expensive especially when the number of factors is moderate or large.

Little has been done to construct efficient designs with mixed factors. Lukemire et al. (2019) proposed the d-QPSO algorithm, a modified quantum-behaved particle swarm optimization (PSO) algorithm, for D-optimal designs under generalized linear models with binary responses. Later, Lukemire et al. (2022) extended the PSO algorithm for locally D-optimal designs under the cumulative logit model with ordinal responses. However, like other stochastic optimization algorithms, the PSO-type algorithms cannot guarantee that an optimal solution will ever be found (Kennedy and Eberhart 1995; Poli et al. 2007).

Following Ai et al. (2023) and Lukemire et al. (2019, 2022), we choose D-criterion, which maximizes the objective function \(f({\varvec{\xi }}) = \left| {\textbf{F}}(\varvec{\xi }) \right| = \left| \sum _{i=1}^m w_i {{\textbf{F}}}_{{{\textbf{x}}}_i}\right| \), \(\varvec{\xi }\in \varvec{\Xi }\). Throughout this paper, we assume \(f({\varvec{\xi }}) > 0\) for some \(\varvec{\xi }\in \varvec{\Xi }\) to avoid trivial optimization problems. Unlike Bu et al. (2020) and Ai et al. (2023), the proposed ForLion algorithm does not need to assume \(\textrm{rank}({{\textbf{F}}}_{{\textbf{x}}}) < p\) for all \({{\textbf{x}}} \in \mathcal{X}\) (see Remark 1 and Example 1). Compared with the PSO-type algorithms for similar purposes (Lukemire et al. 2019, 2022), our ForLion algorithm could improve the relative efficiency of the designs significantly (see Example 3 for an electrostatic discharge experiment discussed by Lukemire et al. (2019) and Sect. S.3 of the Supplementary Material for a surface defects experiment (Phadke 1989; Wu 2008; Lukemire et al. 2022)). Our strategies may be extended to other optimality criteria, such as A-optimality, which minimizes the trace of the inverse of the Fisher information matrix, and E-optimality, which maximizes the smallest eigenvalue of the Fisher information matrix (see, for example, Atkinson et al. (2007)).

The remaining parts of this paper are organized as follows. In Sect. 2, we present the ForLion algorithm for general parametric statistical models. In Sect. 3, we derive the theoretical results for multinomial logistic models (MLM) and revisit the motivated example to demonstrate our algorithm’s performance with mixed factors under general parametric models. In Sect. 4, we specialize our algorithm for generalized linear models (GLM) to enhance computational efficiency by using model-specified formulae and iterations. We use simulation studies to show the advantages of our algorithm. We conclude in Sect. 5.

2 ForLion for D-optimal designs with mixed factors

In this section, we propose a new algorithm, called the ForLion (First-order Lift-one) algorithm, for constructing locally D-optimal approximate designs under a general parametric model \(M({{\textbf{x}}}; \) \(\varvec{\theta })\) with \({{\textbf{x}}} \in {{\mathcal {X}}} \subset {\mathbb {R}}^d\), \(d\ge 1\) and \(\varvec{\theta }\in {\mathbb {R}}^p\), \(p\ge 1\). As mentioned earlier, the design region \({{\mathcal {X}}} = \prod _{j=1}^d I_j\), where \(I_j = [a_j, b_j]\) for \(1\le j\le k\), \(-\infty< a_j< b_j < \infty \), and \(I_j\) is a finite set of at least two distinct numerical levels for \(j>k\). To simplify the notation, we still denote \(a_j = \min I_j\) and \(b_j = \max I_j\) even if \(I_j\) is a finite set.

In this paper, we assume \(1\le k\le d\). That is, there is at least one continuous factor. For cases with \(k=0\), that is, all factors are discrete, one may use the lift-one algorithm for general parametric models (see Remark 1). The goal in this study is to find a design \(\varvec{\xi }= \{({{\textbf{x}}}_i, w_i), i=1, \ldots , m\} \in \varvec{\Xi }\) maximizing \(f({\varvec{\xi }}) = \left| {{\textbf{F}}}(\varvec{\xi })\right| \), the determinant of \({{\textbf{F}}}(\varvec{\xi })\), where \({\textbf{F}}(\varvec{\xi }) = \sum _{i=1}^m w_i {{\textbf{F}}}_{{{\textbf{x}}}_i} \in {\mathbb {R}}^{p\times p}\). Here \(m\ge 1\) is flexible.

Algorithm 1
figure a

ForLion

Given a design \(\varvec{\xi }= \{({{\textbf{x}}}_i, w_i), i=1, \ldots , m\}\) reported by the ForLion algorithm (see Algorithm 1), the general equivalence theorem (Kiefer 1974; Pukelsheim 1993; Stufken and Yang 2012; Fedorov and Leonov 2014) guarantees its D-optimality on \({{\mathcal {X}}}\). As a direct conclusion of Theorem 2.2 in Fedorov and Leonov (2014), we have the theorem as follows under the regularity conditions (see Sect. S.8 in the Supplementary Material, as well as Assumptions (A1), (A2) and (B1)–(B4) in Sect. 2.4 of Fedorov and Leonov (2014)).

Theorem 1

Under regularity conditions, there exists a D-optimal design that contains no more than \(p(p+1)/2\) design points. Furthermore, if \(\varvec{\xi }\) is obtained by Algorithm 1, it must be D-optimal.

We relegate the proof of Theorem 1 and others to Sect. S.9 of the Supplementary Material.

Remark 1

Lift-one step for general parametric models:   For commonly used parametric models, \(\textrm{rank}({{\textbf{F}}}_{\textbf{x}}) < p\) for each \({{\textbf{x}}} \in {{\mathcal {X}}}\). For example, all GLMs satisfy \(\textrm{rank}({{\textbf{F}}}_{{\textbf{x}}}) = 1\) (see Sect. 4). However, there exist special cases that \(\textrm{rank}({{\textbf{F}}}_{{\textbf{x}}}) = p\) for almost all \({\textbf{x}} \in {{\mathcal {X}}}\) (see Example 1 in Sect. 3.1).

The original lift-one algorithm (see Algorithm 3 in the Supplementary Material of Huang et al. (2023)) requires \(0\le w_i < 1\) for all \(i=1, \ldots , m\), given the current allocation \({{\textbf{w}}} = (w_1, \ldots , w_m)^T\). If \(\textrm{rank}({{\textbf{F}}}_{{{\textbf{x}}}_i}) < p\) for all i, then \(f(\varvec{\xi }) > 0\) implies \(0\le w_i <1\) for all i. In that case, same as in the original lift-one algorithm, we define the allocation function as

$$\begin{aligned} {{\textbf{w}}}_i(z)= & {} \left( \frac{1-z}{1-w_i} w_1, \ldots , \frac{1-z}{1-w_i} w_{i-1}, z, \right. \\{} & {} \left. \frac{1-z}{1-w_i} w_{i+1}, \ldots , \frac{1-z}{1-w_i} w_m\right) ^T \\ {}= & {} \frac{1-z}{1-w_i} {{\textbf{w}}} + \frac{z-w_i}{1-w_i} {{\textbf{e}}}_i \end{aligned}$$

where \({{\textbf{e}}}_i = (0, \ldots , 0, 1, 0, \ldots , 0)^T \in {\mathbb {R}}^m\), whose ith coordinate is 1, and z is a real number in [0, 1], such that \({\textbf{w}}_i(z) = {\textbf{w}}_i\) at \(z=w_i\), and \({\textbf{w}}_i(z)={\textbf{e}}_i\) at \(z=1\). However, if \(\textrm{rank}({{\textbf{F}}}_{{{\textbf{x}}}_i}) = p\) and \(w_i=1\) for some i, we still have \(f(\varvec{\xi }) > 0\), but the above \({{\textbf{w}}}_i(z)\) is not well defined. In that case, we define the allocation function in the ForLion algorithm as

$$\begin{aligned} {{\textbf{w}}}_i(z)= & {} \left( \frac{1-z}{m-1}, \ldots , \frac{1-z}{m-1}, z, \frac{1-z}{m-1}, \ldots , \frac{1-z}{m-1}\right) ^T \\ {}= & {} \frac{m(1-z)}{m-1}{{\textbf{w}}}_u + \frac{mz-1}{m-1}{{\textbf{e}}}_i \end{aligned}$$

where \({{\textbf{w}}}_u = (1/m, \ldots , 1/m)^T \in {\mathbb {R}}^m\) is a uniform allocation. For \(j\ne i\), we define \({{\textbf{w}}}_j(z) = (1-z) {{\textbf{e}}}_i + z {{\textbf{e}}}_j\) . The rest parts are the same as the original life-one algorithm. \(\Box \)

Remark 2

Convergence in finite iterations:   In practice, we may relax the stopping rule \(d({{\textbf{x}}}^*, {\varvec{\xi }}_t) \le p\) in Step \(6^\circ \) of Algorithm 1 to \(d({{\textbf{x}}}^*, {\varvec{\xi }}_t) \le p + \epsilon \), where \(\epsilon \) could be the same as in Step \(0^\circ \). By Sect. 2.5 in Fedorov and Leonov (2014), \(f((1-\alpha ){\varvec{\xi }}_t + \alpha {{\textbf{x}}}^*) - f({\varvec{\xi }}_t) \approx \alpha (d({\textbf{x}}^*, {\varvec{\xi }}_t) - p)\) for small enough \(\alpha > 0\), where \((1-\alpha ){\varvec{\xi }}_t + \alpha {{\textbf{x}}}^*\) is the design \(\{({{\textbf{x}}}_i^{(t)}, (1-\alpha ) w_i^{(t)}), i=1, \ldots , m_t\} \bigcup \{({{\textbf{x}}}^*, \alpha )\}\). Thus, if we find an \({\textbf{x}}^*\), such that, \(d({{\textbf{x}}}^*, {\varvec{\xi }}_t) > p + \epsilon \), then there exists an \(\alpha _0 \in (0,1)\), such that, \(f((1-\alpha _0){\varvec{\xi }}_t + \alpha _0 {{\textbf{x}}}^*) - f({\varvec{\xi }}_t)> \alpha _0(d({{\textbf{x}}}^*, {\varvec{\xi }}_t) - p)/2 > \alpha _0 \epsilon /2\). For small enough merging threshold \(\delta \) (see Steps \(0^\circ \) and \(2^\circ \)), we can still guarantee that \(f({\varvec{\xi }}_{t+1}) - f({\varvec{\xi }}_t) > \alpha _0\epsilon /4\) after Step \(2^\circ \). Under regularity conditions, \({{\mathcal {X}}}\) is compact, and \(f(\varvec{\xi })\) is continuous and bounded. Our algorithm is guaranteed to stop in finite steps. Actually, due to the lift-one step (Step \(3^\circ \)), \(f({\varvec{\xi }}_t)\) is improved fast, especially in the first few steps. For all the examples explored in this paper, our algorithm works efficiently. \(\Box \)

Remark 3

Distance among design points:   In Step \(1^\circ \) of Algorithm 1, an initial design is selected such that \(\Vert {{\textbf{x}}}_i^{(0)} - {{\textbf{x}}}_j^{(0)}\Vert \ge \delta \), and in Step \(2^\circ \), two design points are merged if \(\Vert {{\textbf{x}}}_i^{(t)} - {{\textbf{x}}}_j^{(t)}\Vert < \delta \). The algorithm uses the Euclidean distance as a default metric. Nevertheless, to take the effects of ranges or units across factors into consideration, one may choose a different distance, for example, a normalized distance, such that, \(\Vert {{\textbf{x}}}_i - {{\textbf{x}}}_j\Vert ^2 = \sum _{l=1}^d \left( \frac{x_{il}-x_{jl}}{b_l-a_l}\right) ^2\), where \({{\textbf{x}}}_i = (x_{i1}, \ldots , x_{id})^T\) and \({{\textbf{x}}}_j = (x_{j1}, \ldots , x_{jd})^T\). Another useful distance is to define \(\Vert {{\textbf{x}}}_i - {{\textbf{x}}}_j\Vert = \infty \) whenever their discrete factor levels are different, that is, \(x_{il}\ne x_{jl}\) for some \(l>k\). Such a distance does not allow any two design points that have distinct discrete factor levels to merge, which makes a difference when \(\delta \) is chosen to be larger than the smallest difference between discrete factor levels. Note that the choice of distance and \(\delta \) (see Sect. S.7 in the Supplementary Material) won’t affect the continuous search for a new design point in Step \(5^\circ \). It is different from the adaptive grid strategies used in the literature (Duarte et al. 2018; Harman et al. 2020; Harman and Rosa 2020) for optimal designs, where the grid can be increasingly reduced in size to locate the support points more accurately. \(\Box \)

Remark 4

Global maxima:   According to Theorem 1, when Algorithm 1 converges, that is, \(\max _{{{\textbf{x}}} \in {{\mathcal {X}}}} d({{\textbf{x}}}, {\varvec{\xi }}_t) \le p\), the design \({\varvec{\xi }}_t\) must be D-optimal. Nevertheless, from a practical point of view, there are two possible issues that may occur. Firstly, the algorithm may fail to find the global maxima in Step \(5^\circ \), which may happen even with the best optimization software (Givens and Hoeting 2013). As a common practice (see, for example, Sect. 3.2 in Givens and Hoeting (2013)), one may randomly generate multiple (such as 3 or 5) starting points when finding \({{\textbf{x}}}_{(1)}^*\) in Step \(5^\circ \), and utilize the best one among them. Secondly, the global maxima or D-optimal design may not be unique (see, for example, Remark 2 in Yang et al. (2016)). In that case, one may keep a collection of D-optimal designs that the algorithm can find. Due to the log-concavity of the D-criterion (see, for example, Fedorov (1972)), any convex combinations of D-optimal designs are still D-optimal. \(\Box \)

3 D-optimal designs for MLMs

In this section, we consider experiments with categorical responses. Following Bu et al. (2020), given \(n_i > 0\) experimental units assigned to a design setting \({{\textbf{x}}}_i \in {{\mathcal {X}}}\), the summarized responses \({{\textbf{Y}}}_i = (Y_{i1}, \ldots , Y_{iJ})^T\) follow \(\textrm{Multinomial}(n_i; \pi _{i1}, \ldots , \pi _{iJ})\) with categorical probabilities \(\pi _{i1}, \ldots , \pi _{iJ}\), where \(J\ge 2\) is the number of categories, and \(Y_{ij}\) is the number of experimental units with responses in the jth category. Multinomial logistic models (MLM) have been commonly used for modeling categorical responses (Glonek and McCullagh 1995; Zocchi and Atkinson 1999; Bu et al. 2020). A general MLM can be written as

$$\begin{aligned} {{\textbf{C}}}^T\log ({\textbf{L}}{\varvec{\pi }}_i)={\varvec{\eta }}_i={\textbf{X}}_i{\varvec{\theta }}, \qquad i=1,\cdots ,m \end{aligned}$$
(1)

where \({\varvec{\pi }}_i = (\pi _{i1}, \ldots , \pi _{iJ})^T\) satisfying \(\sum _{j=1}^J \pi _{ij}=1\), \({\varvec{\eta }}_i = (\eta _{i1}, \ldots , \eta _{iJ})^T\), \({\textbf{C}}\) is a \((2J-1) \times J\) constant matrix, \({\textbf{X}}_i\) is the model matrix of \(J \times p\) at \({{\textbf{x}}}_i\), \(\varvec{\theta }\in {\mathbb {R}}^p\) are model parameters, and \({\textbf{L}}\) is a \((2J-1) \times J\) constant matrix taking different forms for four special classes of MLM models, namely, baseline-category, cumulative, adjacent-categories, and continuation-ratio logit models (see Bu et al. (2020) for more details). When \(J=2\), all four logit models are essentially logistic regression models for binary responses, which belong to generalized linear models (see Sect. 4).

3.1 Fisher information \({{\textbf{F}}}_{{{\textbf{x}}}}\) and sensitivity function \(d({{\textbf{x}}}, \varvec{\xi })\)

The \(p\times p\) matrix \({{\textbf{F}}}_{{{\textbf{x}}}}\), known as the Fisher information at \({{\textbf{x}}} \in \mathcal{X}\), plays a key role in the ForLion algorithm. We provide its formula in detail in Theorem 2.

Theorem 2

For MLM (1), the Fisher information \({\textbf{F}}_{{\textbf{x}}}\) at \({{\textbf{x}}} \in \mathcal{X}\) (or \(\mathcal{X}_{\varvec{\theta }}\) for cumulative logit models, see Bu et al. (2020)) can be written as a block matrix \(({\textbf{F}}^{{\textbf{x}}}_{st})_{J\times J} \in {\mathbb {R}}^{p\times p}\), where \({{\textbf{F}}}^{{\textbf{x}}}_{st}\), a sub-matrix of \({\textbf{F}}_{{\textbf{x}}}\) with block row index s and column index t in \(\{1, \ldots , J\}\), is given by

$$\begin{aligned} \left\{ \begin{array}{ll} u_{st}^{{\textbf{x}}} \cdot {{\textbf{h}}}_s({{\textbf{x}}}) {{\textbf{h}}}_t({{\textbf{x}}})^T &{} \text{, } \text{ for } 1\le s, t\le J-1\\ \sum _{j=1}^{J-1} u_{jt}^{{\textbf{x}}} \cdot {{\textbf{h}}}_c({{\textbf{x}}}) {{\textbf{h}}}_t({{\textbf{x}}})^T &{} \text{, } \text{ for } s=J, 1\le t\le J-1\\ \sum _{j=1}^{J-1} u_{sj}^{{\textbf{x}}} \cdot {{\textbf{h}}}_s({{\textbf{x}}}) {{\textbf{h}}}_c({{\textbf{x}}})^T &{} \text{, } \text{ for } 1\le s\le J-1, t=J\\ \sum _{i=1}^{J-1} \sum _{j=1}^{J-1} u_{ij}^{{\textbf{x}}} \cdot {\textbf{h}}_c({{\textbf{x}}}) {{\textbf{h}}}_c({{\textbf{x}}})^T &{} \text{, } \text{ for } s=J, t=J \end{array}\right. \end{aligned}$$

where \({{\textbf{h}}}_j({{\textbf{x}}})\) and \({{\textbf{h}}}_c({{\textbf{x}}})\) are predictors at \({{\textbf{x}}}\), and \(u_{st}^{{\textbf{x}}}\)’s are known functions of \({{\textbf{x}}}\) and \(\varvec{\theta }\) (more details can be found in Appendix A).

In response to Remark 1 in Sect. 2, we provide below a surprising example that the Fisher information \({{\textbf{F}}}_{{\textbf{x}}}\) at a single point \({{\textbf{x}}}\) is positive definite for almost all \({{\textbf{x}}} \in \mathcal{X}\).

Example 1

Positive Definite \({{\textbf{F}}}_{{\textbf{x}}}\) We consider a special MLM (1) with non-proportional odds (npo) (see Sect. S.8 in the Supplementary Material of Bu et al. (2020) for more details). Suppose \(d=1\) and a feasible design point \({{\textbf{x}}} = x \in [a, b] = \mathcal{X}\), \(J\ge 3\), \({{\textbf{h}}}_1({{\textbf{x}}}) = \cdots = {\textbf{h}}_{J-1}({{\textbf{x}}}) \equiv x\). The model matrix at \({{\textbf{x}}} = x\) is

$$\begin{aligned} {{\textbf{X}}}_x = \left( \begin{array}{cccc} x &{} 0 &{} \cdots &{} 0\\ 0 &{} x &{} \cdots &{} 0\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} x\\ 0 &{} 0 &{} \cdots &{} 0 \end{array}\right) _{J\times (J-1)} \end{aligned}$$

with \(p=J-1\). That is, the model equation (1) in this example is \(\varvec{\eta }_x = {{\textbf{X}}}_x \varvec{\theta } = (\beta _1x, \ldots , \beta _{J-1}x, 0)^T\), where \(\varvec{\theta }= (\beta _1, \ldots , \beta _{J-1})^T \in {\mathbb {R}}^{J-1}\) are the model parameters. Then \({{\textbf{U}}}_x = (u_{st}^x)_{s,t=1,\ldots , J}\) can be calculated from Theorem A.2 in Bu et al. (2020). Note that \(u_{sJ}^x = u_{Js}^x = 0\) for \(s=1, \ldots , J-1\) and \(u_{JJ}^x = 1\). The Fisher information matrix at \({{\textbf{x}}}=x\) is \({{\textbf{F}}}_x = {{\textbf{X}}}_x^T {{\textbf{U}}}_x {{\textbf{X}}}_x = x^2 {{\textbf{V}}}_x\), where \({{\textbf{V}}}_x = (u_{st}^x)_{s,t=1,\ldots , J-1}\). Then \(|{{\textbf{F}}}_x| = x^{2(J-1)} |{{\textbf{V}}}_x|\). According to Equation (S.1) and Lemma S.9 in the Supplementary Material of Bu et al. (2020), \(|{{\textbf{V}}}_x|\) equals to

$$\begin{aligned} \left\{ \begin{array}{ll} \prod _{j=1}^J \pi _j^x &{} \text{ for } \text{ baseline-category, } \\ &{} \text{ adjacent-categories, }\\ &{} \text{ and } \text{ continuation-ratio }\\ \frac{\left[ \prod _{j=1}^{J-1} \gamma _j^x (1-\gamma _j^x)\right] ^2}{\prod _{j=1}^J \pi _j^x} &{} \text{ for } \text{ cumulative } \text{ logit } \text{ models } \end{array}\right. \end{aligned}$$

which is always positive, where \(\gamma _j^x = \sum _{l=1}^j \pi _l^x \in (0,1)\), \(j=1, \ldots , J-1\). In other words, \(\textrm{rank}({\textbf{F}}_x)=p\) in this case, as long as \(x\ne 0\). \(\Box \)

There also exists an example of a special MLM such that \({\textbf{F}}_{{{\textbf{x}}}} = {{\textbf{F}}}_{{{\textbf{x}}}'}\) but \({{\textbf{x}}} \ne {{\textbf{x}}}'\) (see Appendix B).

To search for a new design point \({{\textbf{x}}}^*\) in Step \(5^\circ \) of Algorithm 1, we utilize the R function optim with the option “L-BFGS-B" that allows box constraints. L-BFGS-B is a limited-memory version of the BFGS algorithm, which itself is one of several quasi-Newton methods (Byrd et al. 1995). It works fairly well in finding solutions even at the boundaries of the box constraints. We give explicit formulae for computing \(d({\textbf{x}}, \varvec{\xi })\) below and provide the first-order derivative of the sensitivity function for MLM (1) in Appendix C.

Theorem 3

Consider MLM (1) with a compact \({\mathcal X}\). A design \(\varvec{\xi }= \{({{\textbf{x}}}_i, w_i), i=1, \ldots , m\}\) with \(f({\varvec{\xi }}) > 0\) is D-optimal if and only if \(\max _{{{\textbf{x}}} \in {{\mathcal {X}}}} d({{\textbf{x}}}, {\varvec{\xi }}) \le p\), where

$$\begin{aligned} d({{\textbf{x}}}, \varvec{\xi })= & {} \sum _{j=1}^{J-1} u_{jj}^{{\textbf{x}}} ({{\textbf{h}}}_j^{{\textbf{x}}})^T {{\textbf{C}}}_{jj} {{\textbf{h}}}_j^{{\textbf{x}}} \nonumber \\{} & {} \quad + \sum _{i=1}^{J-1} \sum _{j=1}^{J-1} u_{ij}^{{\textbf{x}}}\cdot ({{\textbf{h}}}_c^{{\textbf{x}}})^T {{\textbf{C}}}_{JJ} {{\textbf{h}}}_c^{{\textbf{x}}}\nonumber \\{} & {} \quad + 2\sum _{i=1}^{J-2} \sum _{j=i+1}^{J-1} u_{ij}^{{\textbf{x}}} ({{\textbf{h}}}_j^{{\textbf{x}}})^T {{\textbf{C}}}_{ij} {{\textbf{h}}}_i^{{\textbf{x}}} \nonumber \\{} & {} \quad + 2\sum _{i=1}^{J-1} \sum _{j=1}^{J-1} u_{ij}^{{\textbf{x}}} ({\textbf{h}}_c^{{\textbf{x}}})^T {{\textbf{C}}}_{iJ} {{\textbf{h}}}_i^{\textbf{x}} \end{aligned}$$
(2)

\({{\textbf{C}}}_{ij} \in {\mathbb {R}}^{p_i\times p_j}\) is a submatrix of the \(p\times p\) matrix

$$\begin{aligned} {{\textbf{F}}}(\varvec{\xi })^{-1} = \left[ \begin{array}{ccc} {{\textbf{C}}}_{11} &{} \cdots &{} {{\textbf{C}}}_{1J}\\ \vdots &{} \ddots &{} \vdots \\ {{\textbf{C}}}_{J1} &{} \cdots &{} {{\textbf{C}}}_{JJ} \end{array}\right] \end{aligned}$$

\(i,j=1, \ldots , J\), \(p=\sum _{j=1}^J p_j\), and \(p_J=p_c\) . \(\Box \)

3.2 Example: emergence of house flies

In this section, we revisit the motivating example at the beginning of Sect. 1. The original design (Itepan 1995) assigned \(n_i=500\) pupae to each of \(m=7\) doses, \(x_i = 80, 100, 120, 140, 160, 180, 200\), respectively, which corresponds to the uniform design \(\varvec{\xi }_u\) in Table 1. Under a continuation-ratio non-proportional odds (npo) model adopted by Zocchi and Atkinson (1999)

$$\begin{aligned} \log \left( \frac{\pi _{i1}}{\pi _{i2} + \pi _{i3}}\right)= & {} \beta _{11} + \beta _{12} x_i + \beta _{13} x_i^2\\ \log \left( \frac{\pi _{i2}}{\pi _{i3}}\right)= & {} \beta _{21} + \beta _{22} x_i \end{aligned}$$

with fitted parameters \(\varvec{\theta }= (\beta _{11}, \beta _{12}, \beta _{13}, \beta _{21}, \) \(\beta _{22})^T = (-1.935, -0.02642, 0.0003174,\) \(-9.159,\) \(0.06386)^T\), Bu et al. (2020) obtained D-optimal designs under different grid sizes using their lift-one algorithm proposed for discrete factors. With grid size of 20, that is, using the design space \(\{80, 100, \ldots , 200\}\) that was evenly spaced by 20 units, they obtained a design \(\varvec{\xi }_{20} \) containing four design points. With finer grid points on the same interval [80, 200], both their grid-5 design \(\varvec{\xi }_5\) (with design space \(\{80, 85, \ldots , 200\}\)) and grid-1 design \(\varvec{\xi }_1\) (with design space \(\{80, 81, \ldots , 200\}\)) contain five design points (see Table 1). By incorporating the Fedorov-Wynn and lift-one algorithms and continuously searching the extended region [0, 200], Ai et al. (2023) obtained a four-points design \({\varvec{\xi }}_{a}\) (see Example S1 in their Supplementary Material).

For \(x \in {{\mathcal {X}}} = [80, 200]\) as in Bu et al. (2020), our ForLion algorithm reports \(\varvec{\xi }_*\) with only three design points (see Table 1, as well as the Supplementary Material, Sect. S.2 for details). Compared with \(\varvec{\xi }_*\), the relative efficiencies of the \(\varvec{\xi }_u\), \(\varvec{\xi }_{20}\), \(\varvec{\xi }_5\), and \(\varvec{\xi }_1\), defined by , are \(82.79\%\), \(99.68\%\), \(99.91\%\), and \(99.997\%\), respectively. The increasing pattern of relative efficiencies, from \(\varvec{\xi }_{20}\) to \(\varvec{\xi }_{5}\) and \(\varvec{\xi }_{1}\), indicates that with finer grid points, the lift-one algorithm can search the design space more thoroughly and find better designs. For \(x \in [0,200]\) as in Ai et al. (2023), our algorithm yields \({\varvec{\xi }}_*'\) with only three points (see Table 1) and the relative efficiency of Ai et al. (2023)’s \({\varvec{\xi }}_a\) with respect to \({\varvec{\xi }}_*'\) is \(99.81\%\). Note that both \(\varvec{\xi }_*\) and \(\varvec{ \xi }_*'\) from our algorithm contain only three experimental settings, which achieves the minimum m justified by Bu et al. (2020). Given that the cost of using the radiation device is expensive and each run of the experiment takes at least hours, our designs can save \(40\%\) or \(25\%\) cost and time compared with Bu et al. (2020)’s and Ai et al. (2023)’s, respectively.

Table 1 Designs for the emergence of house files experiment

To further check if the improvements by our designs are due to randomness, we conduct a simulation study by generating 100 bootstrapped replicates from the original data. For each bootstrapped data, we obtain the fitted parameters, treat them as the true values, and obtain D-optimal designs by Ai et al. (2023)’s, Bu et al. (2020)’s grid-1, and our ForLion algorithm with \(\delta = 0.1\) and \(\epsilon =10^{-10}\), respectively. As Fig. 1 shows, our ForLion algorithm achieves the most efficient designs with the least number of design points. Actually, the median number of design points is 5 for Ai’s, 4 for Bu’s, and 3 for ForLion’s. Compared with ForLion’s, the mean relative efficiency is \(99.82\%\) for Ai’s and \(99.99\%\) for Bu’s. As for computational time, the median time cost on a Windows 11 desktop with 32GB RAM and AMD Ryzen 7 5700 G processor is 3.59s for Ai’s, 161.81s for Bu’s, and 44.88s for ForLion’s.

Fig. 1
figure 1

Boxplots of Ai’s, Bu’s, and ForLion’s designs for 100 bootstrapped data

4 D-optimal designs for GLMs

In this section, we consider experiments with a univariate response Y, which follows a distribution \(f(y;\theta ) = \exp \{y b(\theta ) + c(\theta ) + d(y)\}\) in the exponential family with a single parameter \(\theta \). Examples include binary response \(Y\sim \) Bernoulli\((\theta )\), count response \(Y\sim \) Poisson\((\theta )\), positive response \(Y\sim \) Gamma\((\kappa , \theta )\) with known \(\kappa >0\), and continuous response \(Y\sim N(\theta , \sigma ^2)\) with known \(\sigma ^2 > 0\) (McCullagh and Nelder 1989). Suppose independent responses \(Y_1, \ldots , Y_n\) are collected with corresponding factor level combinations \({{\textbf{x}}}_1, \ldots , {{\textbf{x}}}_n \in {{\mathcal {X}}} \subset {\mathbb {R}}^{d}\), where \({{\textbf{x}}}_i = (x_{i1}, \ldots , x_{id})^T\). Under a generalized linear model (GLM), there exists a link function g, parameters of interest \(\varvec{\beta } = (\beta _1, \beta _2, \ldots , \beta _p)^T\), and the corresponding vector of p known and deterministic predictor functions \({{\textbf{h}}} = (h_1, \ldots , h_p)^T\), such that

$$\begin{aligned} E(Y_i) = \mu _i \text{ and } \eta _i = g(\mu _i)= {\textbf{X}}_i^T\varvec{\beta } \end{aligned}$$
(3)

where \({{\textbf{X}}}_i = {{\textbf{h}}}({{\textbf{x}}}_i) = (h_1({\textbf{x}}_i), \ldots , h_p({{\textbf{x}}}_i))^T\), \(i=1, \ldots , n\). For many applications, \(h_1({{\textbf{x}}}_i)\equiv 1\) represents the intercept of the model.

4.1 ForLion algorithm specialized for GLM

Due to the specific form of GLM’s Fisher information (see Sect. S.4 and (S4.2) in the Supplementary Material), the lift-one algorithm can be extremely efficient by utilizing analytic solutions for each iteration (Yang and Mandal 2015). In this section, we specialize the ForLion algorithm for GLM with explicit formulae in Steps \(3^\circ \), \(5^\circ \), and \(6^\circ \).

For GLM (3), our goal is to find a design \(\varvec{\xi }= \{({{\textbf{x}}}_i, w_i), i=1, \ldots , m\}\) maximizing \(f({\varvec{\xi }}) = |{{\textbf{X}}}_{\varvec{\xi }}^T{\textbf{W}}_{\varvec{\xi }}{{\textbf{X}}}_{\varvec{\xi }}|\), where \({\textbf{X}}_{\varvec{\xi }} = ({{\textbf{h}}}({{\textbf{x}}}_1), \ldots , {\textbf{h}}({{\textbf{x}}}_m))^T \in {\mathbb {R}}^{m\times p}\) with known predictor functions \(h_1, \ldots , h_p\) , and \({\textbf{W}}_{\varvec{\xi }} = \textrm{diag} \{w_1 \nu ({\varvec{\beta }}^T {{\textbf{h}}}({{\textbf{x}}}_1)), \ldots , w_m \nu ({\varvec{\beta }}^T {{\textbf{h}}}({{\textbf{x}}}_m))\}\) with known parameters \(\varvec{\beta }= (\beta _1, \ldots , \beta _p)^T\) and a positive differentiable function \(\nu \), where \(\nu (\eta _i) = (\partial \mu _i/\partial \eta _i)^2/\textrm{Var}(Y_{i})\), for \(i=1, \ldots , m\) (see Sects. S.1 and S.4 in the Supplementary Material for examples and more technical details). The sensitivity function \(d({{\textbf{x}}}, {\varvec{\xi }}) = \textrm{tr}({{\textbf{F}}}({\varvec{\xi }})^{-1} {{\textbf{F}}}_{{\textbf{x}}})\) in Step \(5^\circ \) of Algorithm 1 can be written as \(\nu ({\varvec{\beta }}^T {{\textbf{h}}}({{\textbf{x}}})) \cdot {{\textbf{h}}}({{\textbf{x}}})^T ({{\textbf{X}}}_{\varvec{\xi }}^T{\textbf{W}}_{\varvec{\xi }}{{\textbf{X}}}_{\varvec{\xi }})^{-1}\) \( {\textbf{h}}({{\textbf{x}}})\). As a direct conclusion of the general equivalence theorem (see Theorem 2.2 in Fedorov and Leonov (2014)), we have the following theorem for GLMs.

Theorem 4

Consider GLM (3) with a compact design region \({\mathcal X}\). A design \(\varvec{\xi }= \{({{\textbf{x}}}_i, w_i), i=1, \ldots , m\}\) with \(f({\varvec{\xi }}) = |{\textbf{X}}_{\varvec{\xi }}^T{{\textbf{W}}}_{\varvec{\xi }}{\textbf{X}}_{\varvec{\xi }}| > 0\) is D-optimal if and only if

$$\begin{aligned} \max _{{{\textbf{x}}} \in {{\mathcal {X}}}} \nu ({\varvec{\beta }}^T {{\textbf{h}}}({{\textbf{x}}})) \cdot {{\textbf{h}}}({{\textbf{x}}})^T ({\textbf{X}}_{\varvec{\xi }}^T{{\textbf{W}}}_{\varvec{\xi }}{\textbf{X}}_{\varvec{\xi }})^{-1} {{\textbf{h}}}({{\textbf{x}}}) \le p \end{aligned}$$
(4)

Given the design \({\varvec{\xi }}_t = \{({{\textbf{x}}}_i^{(t)}, w_i^{(t)}), i=1, \ldots , m_t\}\) at the tth iteration, suppose in Step \(5^\circ \) we find the design point \({{\textbf{x}}}^* \in {{\mathcal {X}}}\) maximizing \(d({{\textbf{x}}}, {\varvec{\xi }}_t) = \nu ({\varvec{\beta }}^T {{\textbf{h}}}({{\textbf{x}}})) \cdot {\textbf{h}}({{\textbf{x}}})^T ({{\textbf{X}}}_{{\varvec{\xi }}_t}^T{\textbf{W}}_{{\varvec{\xi }}_t}{{\textbf{X}}}_{{\varvec{\xi }}_t})^{-1} {{\textbf{h}}}({{\textbf{x}}})\) according to Theorem 4. Recall that \(d({{\textbf{x}}}^*, {\varvec{\xi }}_t) \le p\) in this step implies the optimality of \({\varvec{\xi }}_t\) and the end of the iterations. If \(d({\textbf{x}}^*, {\varvec{\xi }}_t) > p\), \({{\textbf{x}}}^*\) will be added to form the updated design \({\varvec{\xi }}_{t+1}\) . For GLMs, instead of letting \({\varvec{\xi }}_{t+1} = {\varvec{\xi }}_t \bigcup \{({{\textbf{x}}}^*, 0)\}\), we recommend \(\{({\textbf{x}}_i^{(t)}, (1-\alpha _t)w_i^{(t)}), i=1, \ldots , m_t\} \bigcup \{({{\textbf{x}}}^*, \alpha _t)\}\), denoted by \((1-\alpha _t) {\varvec{\xi }}_t \bigcup \{({{\textbf{x}}}^*, \alpha _t)\}\) for simplicity, where \(\alpha _t \in [0,1]\) is an initial allocation for the new design point \({\textbf{x}}^*\), which is determined by Theorem 5.

Theorem 5

Given \({\varvec{\xi }}_t = \{({{\textbf{x}}}_i^{(t)}, w_i^{(t)}), i=1, \ldots , m_t\}\) and \({{\textbf{x}}}^* \in {{\mathcal {X}}}\), if we consider \({\varvec{\xi }}_{t+1}\) in the form of \((1-\alpha ) {\varvec{\xi }}_t \bigcup \{({{\textbf{x}}}^*, \alpha )\}\) with \(\alpha \in [0,1]\), then

$$\begin{aligned} \alpha _t = \left\{ \begin{array}{cl} \frac{2^p\cdot d_t - (p+1) b_t}{p(2^p\cdot d_t - 2 b_t)} &{} \text{ if } 2^p\cdot d_t > (p+1) b_t\\ 0 &{} \text{ otherwise } \end{array} \right. \end{aligned}$$

maximizes \(f({\varvec{\xi }}_{t+1})\) with \(d_t = f(\{({\textbf{x}}^{(t)}_1, w^{(t)}_1/2),\)\(\ldots , ({{\textbf{x}}}^{(t)}_{m_t}, w^{(t)}_{m_t}/2), ({{\textbf{x}}}^*, 1/2)\})\) and \(b_t = f({\varvec{\xi }}_t)\).

Based on \(\alpha _t\) in Theorem 5, which is obtained essentially via one iteration of the lift-one algorithm (Yang et al. 2016; Yang and Mandal 2015) with \({{\textbf{x}}}^*\) added, we update Step \(6^\circ \) of Algorithm 1 with Step \(6'\) for GLMs, which speeds up the ForLion algorithm significantly.

  • \(6'\) If \(d({{\textbf{x}}}^*, {\varvec{\xi }}_t) \le p\), go to Step \(7^\circ \). Otherwise, we let \({\varvec{\xi }}_{t+1} = (1-\alpha _t){\varvec{\xi }}_t \bigcup \{({{\textbf{x}}}^*, \alpha _t)\}\), replace t by \(t+1\), and go back to Step \(2^\circ \), where \(\alpha _t\) is given by Theorem 5.

The advantages of the lift-one algorithm over commonly used numerical algorithms include simplified computation and exact zero weight for negligible design points. For GLMs, Step \(3^\circ \) of Algorithm 1 should be specialized with analytic iterations as in Yang and Mandal (2015). We provide the explicit formula for the sensitivity function’s first-order derivative in Sect. S.5 of the Supplementary Material for Step \(5^\circ \).

By utilizing the analytical solutions for GLM in Steps \(3^\circ \), \(5^\circ \) and \(6'\), the computation is much faster than the general procedure of the ForLion algorithm (see Example 2 in Sect. 4.3).

4.2 Minimally supported design and initial design

A minimally supported design \(\varvec{\xi }= \{({{\textbf{x}}}_i, w_i), i=1, \ldots , m\}\) achieves the smallest possible m such that \(f({\varvec{\xi }}) > 0\), or equivalently, the Fisher information matrix is of full rank. Due to the existence of Example 1, m could be as small as 1 for an MLM. Nevertheless, m must be p or above for GLMs due to the following theorem.

Theorem 6

Consider a design \(\varvec{\xi }= \{({{\textbf{x}}}_i, w_i), i=1, \ldots , m\}\) with m support points, that is, \(w_i > 0\), \(i=1, \ldots , m\), for GLM (3). Then \(f({\varvec{\xi }}) > 0\) only if \({{\textbf{X}}}_{\varvec{\xi }}\) is of full column rank p, that is, \(\textrm{rank}({{\textbf{X}}}_{\varvec{\xi }}) = p\). Therefore, a minimally supported design contains at least p support points. Furthermore, if \(\nu ({\varvec{\beta }}^T {{\textbf{h}}}({{\textbf{x}}})) > 0\) for all \({{\textbf{x}}}\) in the design region \({{\mathcal {X}}}\), then \(f({\varvec{\xi }}) > 0\) if and only if \(\textrm{rank}({\textbf{X}}_{\varvec{\xi }}) = p\).

Theorem 7 shows that a minimally supported D-optimal design under a GLM must be a uniform design.

Theorem 7

Consider a minimally supported design \(\varvec{\xi }= \{({\textbf{x}}_i, w_i), i=1, \ldots , p\}\) for GLM (3) that satisfies \(f({\varvec{\xi }}) > 0\). It is D-optimal only if \(w_i=p^{-1}\), \(i=1, \ldots , p\). That is, it is a uniform allocation of its support points.

Based on Theorem 7, we recommend a minimally supported design as the initial design for the ForLion algorithm under GLMs. The advantage is that once p design points \({\textbf{x}}_1, \ldots , {{\textbf{x}}}_p\) are chosen from \({{\mathcal {X}}}\), such that the model matrix \({{\textbf{X}}}_{\varvec{\xi }} = ({\textbf{h}}({{\textbf{x}}}_1), \ldots , {{\textbf{h}}}({{\textbf{x}}}_p))^T\) is of full rank p, then the design \(\varvec{\xi }= \{({{\textbf{x}}}_i, 1/p), i=1, \ldots , p\}\) is D-optimal given those p design points.

Recall that a typical design space can take the form of \({\mathcal X} = \prod _{j=1}^d I_j\), where \(I_j\) is either a finite set of distinct numerical levels or an interval \([a_j, b_j]\), and \(a_j = \min I_j\) and \(b_j = \max I_j\) even if \(I_j\) is a finite set. As one option in Step \(1^\circ \) of Algorithm 1, we suggest to choose p initial design points from \(\prod _{j=1}^d \{a_j, b_j\}\). For typical applications, we may assume that there exist p distinct points in \(\prod _{j=1}^d \{a_j, b_j\}\) such that the corresponding model matrix \({{\textbf{X}}}_{\varvec{\xi }}\) is of full rank, or equivalently, the \(2^d\times p\) matrix consisting of rows \({{\textbf{h}}}({{\textbf{x}}})^T, {{\textbf{x}}} \in \prod _{j=1}^d \{a_j, b_j\}\) is of full rank p. Herein, we specialize Step \(1^\circ \) of Algorithm 1 for GLMs as follows:

\(1'\):

Construct an initial design \({\varvec{\xi }}_0 = \{({{\textbf{x}}}_i^{(0)}, p^{-1}),\) \(i=1, \ldots , p\}\) such that \({{\textbf{x}}}_1^{(0)},\) \(\ldots ,\) \({{\textbf{x}}}_p^{(0)} \in \prod _{j=1}^d \{a_j, b_j\}\) and \({{\textbf{X}}}_{\varvec{\xi }_0} = ({{\textbf{h}}}({{\textbf{x}}}_1^{(0)}), \ldots , {{\textbf{h}}}({\textbf{x}}_p^{(0)}))^T\) is of full rank p.

4.3 Examples under GLMs

In this section, we use two examples to show the performance of our ForLion algorithm under GLMs, both with continuous factors involved only as main effects, which have simplified notations (see Sect. S.6 in the Supplementary Material).

Example 2

In Example 4.7 of Stufken and Yang (2012), they considered a logistic model with three continuous factors \(\textrm{logit}(\mu _i) = \beta _0 + \beta _1 x_{i1} + \beta _2 x_{i2} + \beta _3 x_{i3}\) with \(x_{i1} \in [-2, 2]\), \(x_{i2} \in [-1,1]\), and \(x_{i3} \in (-\infty , \infty )\). Assuming \((\beta _0, \beta _1, \beta _2, \beta _3) = (1, -0.5, 0.5, 1)\), they obtained an 8-points D-optimal design \({\varvec{\xi }}_o\) theoretically. Using a MacOS-based laptop with CPU 2 GHz Quad-Core and memory 16GB 3733 MHz, our ForLion algorithm specialized for GLMs takes 17 s and our general ForLion algorithm takes 124 s to get the same design \({\varvec{\xi }}_{*}\) (see Table 2). The relative efficiency of \({\varvec{\xi }}_{*}\) compared with \({\varvec{\xi }}_o\) is simply \(100\%\). Note that \({\varvec{\xi }}_o\) was obtained from an analytic approach requiring an unbounded \(x_{i3}\) . For bounded \(x_{i3}\) , such as \(x_{i3}\in [-1, 1]\), \([-2, 2]\), or \([-3, 3]\), we can still use the ForLion algorithm to obtain the corresponding D-optimal designs, whose relative efficiencies compared with \({\varvec{\xi }}_o\) or \({\varvec{\xi }}_{*}\) are \(85.55\%\), \(99.13\%\), and \(99.99993\%\), respectively. \(\Box \)

Example 3

Lukemire et al. (2019) reconsidered the electrostatic discharge (ESD) experiment described by Whitman et al. (2006) with a binary response and five mixed factors. The first four factors LotA, LotB, ESD, Pulse take values in \(\{-1, 1\}\), and the fifth factor Voltage \(\in [25, 45]\) is continuous. Using their d-QPSO algorithm, Lukemire et al. (2019) obtained a 13-points design \({\varvec{\xi }}_o\) for the model \(\textrm{logit} (\mu ) = \beta _0 + \beta _1 \texttt {Lot A} + \beta _2 \texttt {Lot B} + \beta _3 \texttt {ESD} + \beta _4 \texttt {Pulse} + \beta _5 \texttt {Voltage} + \beta _{34} (\texttt{ESD}\times \texttt {Pulse})\) with assumed parameter values \(\varvec{\beta }= (-7.5, 1.50, -0.2, -0.15,\) \(0.25, 0.35, 0.4)^T\). It takes 88 seconds using the same laptop as in Example 2 for our (GLM) ForLion algorithm to find a slightly better design \({\varvec{\xi }}_{*}\) consisting of 14 points (see Table S2 in the Supplementary Material) with relative efficiency .

Table 2 Designs obtained for Example 2

To make a thorough comparison, we randomly generate 100 sets of parameters \(\varvec{\beta }\) from independent uniform distributions: U(1.0, 2.0) for LotA, \(U(-0.3,\) \( -0.1)\) for LotB, \(U(-0.3, 0.0)\) for ESD, U(0.1, 0.4) for Pulse, U(0.25, 0.45) for Voltage, U(0.35, 0.45) for ESD\(\times \)Pulse, and \(U(-8.0, -7.0)\) for Intercept. For each simulated \(\varvec{\beta }\), we treat it as the true parameter values and obtain D-optimal designs using d-QPSO, ForLion, and Fedorov-Wynn-liftone (that is, the ForLion algorithm without the merging step, similarly in spirit to Ai et al. (2023)’s) algorithms. For the ForLion algorithm, we use \(\delta =0.03\) and \(\epsilon =10^{-8}\) (see Sect. S.7 in the Supplementary Material for more discussion on choosing \(\delta \)). For the d-QPSO algorithm, following Lukemire et al. (2019), we use 5 swarms with 30 particles each and the algorithm searches design with up to 18 support points and the maximum number of iterations 4, 000. Figure 2 shows the numbers of design points of the three algorithms and the relative efficiencies. The median number of design points is 13 for d-QPSO’s, 39 for Fedorov-Wynn-liftone’s, and 13 for Forlion’s. The mean relative efficiencies compared with our ForLion D-optimal designs, defined as \([f(\cdot )/f({\varvec{\xi }}_{\textrm{ForLion}})]^{1/7}\), is \(86.95\%\) for d-QPSO’s and \(100\%\) for Fedorov-Wynn-liftone’s. The median running time on the same desktop in Sect. 3.2 is 10.94s for d-QPSO’s, 129.74s for Fedorov-Wynn-liftone’s, and 71.21s for ForLion’s. \(\Box \)

Fig. 2
figure 2

Boxplots of 100 simulations for d-QPSO, Fedorov-Wynn-liftone, and ForLion algorithms for the electrostatic discharge experiment (Example 3)

5 Conclusion and discussion

In this paper, we develop the ForLion algorithm to find locally D-optimal approximate designs under fairly general parametric models with mixed factors.

Compared with Bu et al. (2020)’s and Ai et al. (2023)’s algorithm, our ForLion algorithm can reduce the number of distinct experimental settings by \(25\%\) above on average while keeping the highest possible efficiency (see Sect. 3.2). In general, for experiments such as the Emergence of house flies (see Sect. 3.2), the total experimental cost not only relies on the number of experimental units but also the number of distinct experimental settings (or runs). For such kinds of experiments, an experimental plan with fewer distinct experimental settings may allow the experimenter to support more experimental units under the same budget limit, and complete the experiment with less time cost. Under the circumstances, our ForLion algorithm may be extended for a modified design problem that maximizes \(n^p|{{\textbf{F}}}(\varvec{\xi })|\) under a budget constraint \(mC_r + nC_u \le C_0\), that incorporates the experimental run cost \(C_r\) and the experimental unit cost \(C_u\), and perhaps a time constraint \(m\le M_0\) as well.

Compared with PSO-type algorithms, the ForLion algorithm can improve the relative efficiency by \(17.5\%\) above on average while achieving a low number of distinct experimental settings (see Example 3). Our ForLion algorithm may be extended for other optimality criteria by adjusting the corresponding objective function \(f(\varvec{\xi })\) and the sensitivity function \(d({\textbf{x}}, \varvec{\xi })\), as well as a lift-one algorithm modified accordingly to align with those criteria.

In Step \(5^\circ \) of Algorithm 1, the search for a new design point \(x^*\) involves solving for continuous design variables for each level combination of the discrete variables. When the number of discrete variables increases, the number of scenarios grows exponentially, which may cause computational inefficiency. In this case, one possible solution is to treat some of the discrete variables as continuous variables first, run the ForLion algorithm to obtain a design with continuous levels of those discrete factors, and then modify the design points by rounding the continuous levels of discrete factors to their nearest feasible levels. For a binary factor, one may simply use [0, 1] or \([-1,1]\) as a continuous region of the possible levels. Note that there are also experiments with discrete factors involving three or more categorical levels. For example, the factor of Cleaning Method in an experiment on a polysilicon deposition process for manufacturing very large scale integrated circuits has three levels, namely, None, CM\(_2\), and CM\(_3\) (Phadke 1989). For those scenarios, one may first transform such a discrete factor, say, the Cleaning Method, to two dummy variables (that is, the indicator variables for CM\(_2\) and CM\(_3\), respectively) taking values in \(\{0,1\}\), then run the ForLion algorithm by treating them as two continuous variables taking values in [0, 1].

6 Supplementary information

We provide explicit formulae of \(u_{st}^{{\textbf{x}}}\) in the Fisher information \({\textbf{F}}_{{\textbf{x}}}\) for MLM (1) in Appendix A, a special example of MLM that \({{\textbf{F}}}_{{{\textbf{x}}}} = {\textbf{F}}_{{{\textbf{x}}}'}\) in Appendix B, and the first-order derivative of \(d({\textbf{x}}, \varvec{\xi })\) for MLM (1) in Appendix C.

6.1 Supplementary material

Contents of the Supplementary Material are listed below: S.1 Commonly used GLMs: A list of commonly used GLM models, corresponding link functions, \(\nu \) functions, and their first-order derivatives; S.2 Technical details of house flies example: Technical details of applying the ForLion algorithm to the emergence of house flies example; S.3 Example: Minimizing surface defects: An example with cumulative logit po model that shows the advantages of the ForLion algorithm; S.4 Fisher information matrix for GLMs: Formulae for computing Fisher information matrix for GLMs; S.5 First-order derivative of sensitivity function for GLMs: Formulae of \(\partial d({{\textbf{x}}}, \varvec{\xi })/\partial x_i\) for GLMs; S.6 GLMs with main-effects continuous factors: Details of GLMs with main-effects continuous factors; S.7 Electrostatic discharge example supplementary: The optimal design table for electrostatic discharge example and a simulation study on the effects of merging threshold \(\delta \); S.8 Assumptions needed for Theorem 1; S.9 Proofs: Proofs for theorems in this paper.