Bayesian A-Optimal Design of Experiment with Quantitative and Qualitative Responses

Kang, Lulu; Huang, Xiao

doi:10.1007/s42519-019-0063-6

Bayesian A-Optimal Design of Experiment with Quantitative and Qualitative Responses

Original Article
Published: 10 October 2019

Volume 13, article number 64, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Bayesian A-Optimal Design of Experiment with Quantitative and Qualitative Responses

Download PDF

258 Accesses
1 Citation
Explore all metrics

Abstract

We consider the problem of A-optimal design of experiment under a Bayesian probabilistic model with both categorical and continuous response variables. The utility function of the local design problem is derived by applying Bayesian experimental design framework. We also develop an efficient optimization algorithm to obtain the local optimal design by combining the particle swarm optimization and the blocked coordinate descent methods. In addition, we discuss two different ways of constructing the global optimal design based on the algorithm for local optimal design. Simulation studies are presented to illustrate the efficiency of our approach.

A Laplace-based algorithm for Bayesian adaptive design

Article 05 April 2020

Particle swarm based algorithms for finding locally and Bayesian D-optimal designs

Article Open access 08 April 2019

A Review of Bayesian Optimal Experimental Design on Different Models

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

An engineering system with both quantitative and qualitative (QQ) responses is widely encountered in many cases. For example, [1, 2] have studied an example of the lapping process in a semiconductor manufacturing system, in which the wafers are being produced and processed through a series of stages. The total thickness variation (TTV) measurement of the wafer is a quantitative response to characterize the range of the wafer thickness. The conformity of site total indicator readings (STIR) is a qualitative response of binary values, which is used to indicate whether the STIR is larger than the preset tolerance or not. The two response variables combined provide a more reliable source of information on the quality control of the lapping process than any single variable would do. The Bayesian QQ model, proposed in [2], combines the linear regression model of the continuous response and the generalized linear model of the binary response jointly from a Bayesian perspective. It has proved to be effective in estimation and prediction for such systems. This Bayesian framework is reviewed in Sect. 2.1.

A design of experiment approach is needed to collect data in the most efficient fashion for the systems involving both quantitative and qualitative responses. We plan to tackle this design issue in this article. As experiments can serve various purposes, this problem should be discussed for different design criteria. For example, when the design purpose is to maximize the information gain, the D-optimal criterion should be adopted to maximize the entropy function of the design matrix. The research on constructing the D-optimal design under the Bayesian QQ model is developed in [3], and a point-exchange algorithm is developed to maximize the derived D-optimal utility function. On the other hand, if the design purpose is to obtain an accurate point estimate of a certain linear combination of the unknown parameters, the A-optimal criterion is a more proper choice. In this paper, we focus on the A-optimal design problem under the same Bayesian QQ model and adapt the classic Bayesian experimental design framework to derive the A-optimal design criterion. As we shall see, the A-optimality leads to an optimization problem that has a completely different structure from that in [3], and the traditional point-exchange algorithms are no longer efficient. As a result, new algorithms are required.

In this paper, we develop an algorithm that not only works for the above A-optimal design efficiently but can be applied to other similar design problems. In the optimal design for generalized linear models, usually, the final optimal design criterion involves the integration of a utility function. As a result, most such optimal criteria are not analytically tractable. For this reason, the common solution in the literature is to derive the optimal utility function for a fixed set of model parameters and optimize it to construct a so-called “local optimal design,” and the “global optimal design” for all the possible settings of the model parameters are obtained by Monte Carlo integration of the local optimal designs. Therefore, our main theoretical and algorithmic effort will be put on constructing the local optimal design, the idea of which can be summarized by the following three considerations:

1.
The special format of the objective function allows us to transform the original exact design problem to a continuous design problem. Accordingly, the goal is to find the optimal frequency vector, or the probability distribution, associated with the candidate design points. Then, the exact design can be sampled from the candidate points according to the continuous optimal design. This transformation opens a door for those algorithms for the continuous optimization problems, making the optimal design easier to obtain.
2.
Due to the multimodality of the objective function, spatial exploration is more important than local direct search. Thus, derivative-based and local search optimization methods are less efficient here. We propose applying the particle swarm optimization (PSO), which is an efficient population search method first proposed in [4], to find the optimal solution.
3.
Due to the high dimensionality of decision variables (i.e., the frequency of the candidate design points) in the objective function, we propose combining the PSO algorithm to cyclic blocked coordinate descent method. That is, instead of applying PSO to the entire set of decision variables, we iteratively apply PSO to update one subset of variables at a time, keeping the rest at the current best solutions, and cyclically repeat this process. This allows us to decompose the original optimization problem into a few subproblems of a much lower dimension. As shown in the simulation, the solution that reaches convergence is more stable despite the stochastic nature of PSO.

We note some recently developed works from the literature that can be related to ours. For example, [5] applies PSO for solving a D-optimal continuous design problem under the generalized linear model setting. They tested their method on some previous examples in the literature and reported that PSO did attain the optimality that was found using traditional experimental design methods. Similarly, [6, 7] also applied PSO to continuous optimal design problems. In all these works, either the statistical models or the design criteria differ from ours. The examples of applying PSO to experimental design problems also include [8, 9], but there they focused on discrete design problems that are solved by space-filling techniques. Moreover, to the best of our knowledge, our work is the first one that combines PSO with blocked coordinate descent for dimension reduction and better convergence results.

The rest of this paper is organized as follows. Section 2 lays the theoretical foundation, which contains a review of Bayesian QQ model and the Bayesian experimental design framework, then the objective function for the local A-optimal design is derived. The algorithms of finding the local and global optimal design are described in Sect. 3. The performances of the algorithms are tested for design problems of different scales are shown in Sect. 4. Section 5 concludes this article with a summary.

2 A-Optimal Design Problem Under Bayesian QQ Model

The goal of this section is to derive the objective function for the A-optimal design under the Bayesian QQ model, which is reviewed in Sect. 2.1. The general framework of Bayesian experimental design is reviewed in Sect. 2.2. The derivation of A-optimality is in Sect. 2.3, and an approximation of the exact A-optimality is proposed to simplify computation.

2.1 A Review of Bayesian QQ Model

The theoretical framework of Bayesian QQ model is proposed in [2]. We briefly review it here. Let Y denote the quantitative/continuous response, Z denote the qualitative/binary response, and ${\varvec{x}}=(x_1,\ldots ,x_p)^T$ denote the input variables. Denote the data as $({\varvec{x}}_i,y_i,z_i), i=1,\ldots ,n$, where ${\varvec{x}}_i\in {\mathbb {R}}^p, y_i\in {\mathbb {R}}, z_i\in \{0,1\}$. Also, ${\varvec{X}}=({\varvec{x}}_1^T,\ldots ,{\varvec{x}}_n^T)^T$ is the $n\times p$ design matrix, ${\varvec{y}}=(y_1,\ldots ,y_n)^T$ and ${\varvec{z}}=(z_1,\ldots ,z_n)^T$ are the quantitative and qualitative response vectors, respectively.

To study the dependence of (Y, Z) on ${\varvec{x}}$, we first build a logistic regression model for the distribution of Z, then a linear regression model for Y|Z. Let ${\varvec{f}}({\varvec{x}})=(1,f_1(x),\ldots ,f_q(x))$ be the functional basis that contains the intercept and q modeling terms including the main, interaction and quadratic effects of the input variables, and so on. The distribution of Z is modeled as following,

$$\begin{aligned} Z = \left\{ \begin{array}{ll} {1,} &\text {with probability } \pi ({\varvec{x}}) \\ {0,} &{\text{with probability} } 1-\pi ({\varvec{x}}) \end{array} \right. , \text {with } \pi ({\varvec{x}},{\varvec{\eta }})=\frac{\exp ({\varvec{f}}({\varvec{x}})'{\varvec{\eta }})}{1+\exp ({\varvec{f}}({\varvec{x}})'{\varvec{\eta }})}. \end{aligned}$$

(1)

The conditional distribution Y|Z is modeled as following

$$\begin{aligned} Y|(Z=z)\sim N(z{\varvec{f}}({\varvec{x}})'{\varvec{\beta }}^{(1)}+(1-z){\varvec{f}}({\varvec{x}})'{\varvec{\beta }}^{(2)},\sigma ^2), \end{aligned}$$

(2)

which indicates that $Y|(Z=1)\sim N({\varvec{f}}({\varvec{x}})'{\varvec{\beta }}^{(1)},\sigma ^2)$ and $Y|(Z=0)\sim N({\varvec{f}}({\varvec{x}})'{\varvec{\beta }}^{(2)},\sigma ^2)$. Based on the above modeling approach, the sampling distribution of the data is expressed as

$$\begin{aligned} {\varvec{y}}|{\varvec{z}},{\varvec{\beta }}^{(1)},{\varvec{\beta }}^{(2)},{\varvec{X}}\sim N({\varvec{V}}_1 {\varvec{F}}{\varvec{\beta }}^{(1)}+ {\varvec{V}}_2{\varvec{F}}{\varvec{\beta }}^{(2)},\sigma ^2{\varvec{I}}_n), \end{aligned}$$

(3)

$$\begin{aligned} p({\varvec{z}}|{\varvec{\eta }},{\varvec{X}})\propto \exp \left\{ \sum _{i=1}^n[z_i{\varvec{f}}({\varvec{x}}_i)'{\varvec{\eta }}-\log (1+e^{{\varvec{f}}({\varvec{x}}_i)'{\varvec{\eta }}})]\right\} , \end{aligned}$$

(4)

where ${\varvec{F}}$ is the $n\times (q+1)$ model matrix with ith row as ${\varvec{f}}({\varvec{x}}_i)'$, ${\varvec{V}}_1=\text{ diag }\{z_1,\ldots ,z_n\}$, and ${\varvec{V}}_2={\varvec{I}}_n-{\varvec{V}}_1$.

Following [10], we specify the following priors which have incorporated the effect hierachical principle

$$\begin{aligned} p({\varvec{\eta }})&=N(0,\tau ^2{\varvec{R}}_0), \\ p({\varvec{\beta }}^{(i)})&=N(0,\tau ^2{\varvec{R}}_i),\quad \text { for }i=1,2, \end{aligned}$$

where ${\varvec{R}}_i,i=0,1,2$ are the prior correlation matrices. As in the classic Bayesian approaches, the prior is chosen based on historical data, if they are available, or domain knowledge. Here we choose the diagonal matrices, and their entries are defined following the hierarchical ordering principle [11]. It states that lower-order effects are more likely to be statistically significant than the higher-order effects. Reflected in the prior distribution, the intercept’s prior is $N(0, \tau ^2)$, a linear effect’s prior is $N(0, r_i\tau ^2)$, and a quadratic or an interaction effect’s prior is $N(0, r_i^2\tau ^2)$. We let $r_i\in (0, 1)$, and thus, the prior variances of the effects are decreasing geometrically as their orders increases. Therefore, ${\varvec{R}}_i=\text{ diag }\{1,r_i,\ldots , r_i, r_i^2,\ldots , r_i^2\}$, and there are $m_1$ entries of $r_i$ corresponding the $m_1$ linear effects and $m_2$ entries of $r_i^2$ for the $m_2$ quadratic and interaction effects. If the variance decreases too fast, users can choose other order, for example, $r_i^{1.5}$ for the second-order effects. We have chosen so because it has been used in the literature such as in [2, 12, 13].

Based on the above priors and sampling distributions, the posterior distributions of ${\varvec{\beta }}^{(i)}$ are

$$\begin{aligned} p({\varvec{\beta }}^{(i)}|{\varvec{X}},{\varvec{y}},{\varvec{z}})=N({\varvec{H}}_i^{-1}{\varvec{F}}'{\varvec{V}}_i({\varvec{y}}-\mu _0{\varvec{1}}),\sigma ^2{\varvec{H}}_i^{-1}), \end{aligned}$$

(5)

where ${\varvec{H}}_i= {\varvec{F}}'{\varvec{V}}_i {\varvec{F}}+\rho {\varvec{R}}_i^{-1}$ and $\rho =\frac{\sigma ^2}{\tau ^2}$. The posterior distribution of ${\varvec{\eta }}$, after normal approximation, is:

$$\begin{aligned} p({\varvec{\eta }}|{\varvec{X}},{\varvec{z}})\approx N\left( \hat{{\varvec{\eta }}},\left( \frac{1}{\tau ^2}{\varvec{R}}_0^{-1}+{\varvec{F}}'{\varvec{W}}{\varvec{F}}\right) ^{-1}\right) , \end{aligned}$$

(6)

where $\hat{{\varvec{\eta }}}$ is the posterior mode, and ${\varvec{W}}=\text{ diag }\{\pi ({\varvec{x}}_1,\hat{{\varvec{\eta }}})(1-\pi ({\varvec{x}}_1,\hat{{\varvec{\eta }}})),\ldots , \pi ({\varvec{x}}_n,\hat{{\varvec{\eta }}})(1-\pi ({\varvec{x}}_n,\hat{{\varvec{\eta }}}))\}$. The predictive posterior distribution at new query points is available too, but it is not needed for the A-optimal design.

2.2 A Review of Parametric Bayesian Design Approach

The proposal of the general Bayesian experimental design approach dated back to [14], and [15] provided an excellent review of the topic. In recent years, there has been a significant advancement in the literature on Bayesian optimal experimental design. For example, [16] provided an updated review on the algorithms for Bayesian optimal design with modern computational capacity, [17] proposed an approximate coordinate-exchange algorithm for the Bayesian optimal design criterion, an analytically intractable expected utility function, and [18] showed how randomized quasi-Monte Carlo can significantly reduce the variance of the estimated expected utility function. On the application aspect, [19] developed Bayesian optimal design for generalized linear models, [20] extended the Bayesian A- and D-optimal experimental design criteria to the infinite dimension for the inverse problems in a Hilbert space, and [21] used Bayesian optimal design for the inference and prediction of nonlinear systems.

We review the theoretical framework here as a preparation for deriving the A-optimal objective function. Consider the general problem of designing and analyzing an experiment for collecting data and performing the estimation. Typically, such a process contains three stages: (1) select design sites based on some criterion and construct a design matrix ${\varvec{X}}$; (2) perform the designed experiment and collect the measurement data ${\varvec{y}}$; (3) use ${\varvec{X}}$ and ${\varvec{y}}$ to estimate parameters ${\varvec{\theta }}$ according to the model assumptions. If the model is valid, then the accuracy of the estimation of ${\varvec{\theta }}$ strongly depends on the quality of the design matrix ${\varvec{X}}$, and the latter is what we are trying to optimize.

A utility function is a function of the above three parts, $\phi ({\varvec{X}}, {\varvec{y}}, {\varvec{\theta }})$. It is a measure of the quality of the design matrix ${\varvec{X}}$ in achieving the estimation goal. For different purposes, it takes different forms associated with different design criteria. Two popular instances of utility functions are:

D-Optimal Design: $\phi ({\varvec{X}},{\varvec{y}},{\varvec{\theta }})=\log (p({\varvec{\theta }}|{\varvec{X}},{\varvec{y}}))$. This utility function measures the entropy of the posterior distribution. The goal is to collect data that contains as much information as possible.
A-Optimal Design: $\phi ({\varvec{X}},{\varvec{y}},{\varvec{\theta }})=({\varvec{\theta }}-\hat{{\varvec{\theta }}})'{\varvec{A}}({\varvec{\theta }}-\hat{{\varvec{\theta }}})$, where ${\varvec{A}}$ is diagonal matrix with nonnegative diagonal elements and $\hat{{\varvec{\theta }}}$ is usually taken as the posterior mean or mode. The A-optimality can be considered as a special case of the linear-optimality.^{Footnote 1} The goal, in this case, is to obtain a more accurate estimation of a certain linear combination of ${\varvec{\theta }}$. (The linear combination decides matrix ${\varvec{A}}$.)

The idea of Bayesian experimental design approach is to optimize the following objective function, which is the average of $\phi ({\varvec{X}}, {\varvec{y}}, {\varvec{\theta }})$ over ${\varvec{y}}$ and ${\varvec{\theta }}$ in their joint probability space:

$$\begin{aligned} \Phi ({\varvec{X}})=\int \int \phi ({\varvec{X}}, {\varvec{y}}, {\varvec{\theta }})p({\varvec{y}}|{\varvec{X}},{\varvec{\theta }})p({\varvec{\theta }}) \text {d}{\varvec{y}} \text {d}{\varvec{\theta }}. \end{aligned}$$

(7)

Due to their different definitions, the D-optimal design is obtained by maximizing $\Phi ({\varvec{X}})$, whereas the A-optimal design is obtained by minimizing $\Phi ({\varvec{X}})$.

If the underlying setting is the normal linear model ${\varvec{y}}={\varvec{X}}{\varvec{\theta }}+{\varvec{\epsilon }}$, where ${\varvec{\epsilon }}\sim N(0,\sigma ^2 {\varvec{I}}_n)$, it can be shown that $\Phi ({\varvec{X}})$ defined in (7) have analytical forms for both D-optimal and A-optimal criteria as long as the prior of ${\varvec{\theta }}$ is conjugate. For nonlinear models, such as the generalized linear models, the biggest challenge is to compute the integration to ${\varvec{\theta }}$. The usual approach adopted in this case is to first derive the integration with respect to ${\varvec{y}}$ for fixed values of ${\varvec{\theta }}$,

$$\begin{aligned} \Phi ({\varvec{X}}|{\varvec{\theta }})=\int \phi ({\varvec{X}}, {\varvec{y}}, {\varvec{\theta }})p({\varvec{y}}|{\varvec{X}}, {\varvec{\theta }}) \text {d}{\varvec{y}}, \end{aligned}$$

leading to “local” optimal designs, then find the approximately “global” optimal design for all values of ${\varvec{\theta }}$ by averaging over all local designs for different ${\varvec{\theta }}$ values sampled from prior $p(\theta )$. We provide more details on possible approaches to constructing the global optimal design from local optimal designs in Sect. 3.3.

One usual way of constructing the design matrix ${\varvec{X}}$ by optimizing (7) is to select rows from a prespecified matrix ${\varvec{X}}^c$ of candidate design points. Denote the size of ${\varvec{X}}$ to be $n\times p$ and of ${\varvec{X}}^c$ to be $N\times p$, where $N\gg n$. Selecting n rows of ${\varvec{X}}$ from the N rows of ${\varvec{X}}^c$ is known as the exact optimal design and is an NP-hard combinatorics optimization problem. Thus it would be desirable if this discrete optimization problem can be transformed into a continuous one. Following [23], we are able to do so by taking the continuous optimal design approach if $\Phi ({\varvec{X}})$ in (7) turns out to take the following row-additive format:

$$\begin{aligned} \Phi ({\varvec{X}}) = \Psi \left\{ \sum _{i=1}^n \omega ({\varvec{x}}_i)\right\} , \end{aligned}$$

(8)

where ${\varvec{x}}_i$ is the ith row of ${\varvec{X}}$, $\Psi (\cdot )$ and $\omega (\cdot )$ are certain functions. Then, the objective function in (7) and (8) can be modified to involve all N rows of ${\varvec{X}}^c$ as follows

$$\begin{aligned} \Phi ({\varvec{p}}) = \Psi \left\{ \sum _{i=1}^N p_i\omega ({\varvec{x}}_i)\right\} , \end{aligned}$$

(9)

where ${\varvec{p}}=(p_1,\ldots ,p_N)'$ is the probability distribution for all the candidate points such that $\sum _{i=1}^N p_i=1$. The problem is transformed into a continuous optimization problem of finding the optimal ${\varvec{p}}^*$. Given ${\varvec{p}}^*$, one can sample n rows from the candidate points. This procedure is much easier to work with than the original exact optimal design formulation. Moreover, according to [23], usually the elements of the optimal ${\varvec{p}}^*$ is sparse enough, and we will not lose much design efficiency by this transformation to continuous design.

2.3 The Objective Function for the Local Optimal Design

Under the Bayesian QQ model, combining with the notation in the last section, the vector of unknown parameters is ${\varvec{\theta }}=[{\varvec{\eta }}',({\varvec{\beta }}^{(1)})',({\varvec{\beta }}^{(2)})']'$, and the response data contains ${\varvec{y}}$ and ${\varvec{z}}$. For A-optimal criterion, the utility function mentioned in Sect. 2.1 takes the following form

$$\begin{aligned} \phi ({\varvec{\beta }}^{(1)},{\varvec{\beta }}^{(2)},{\varvec{\eta }},{\varvec{X}},{\varvec{y}},{\varvec{z}})= \left[ \begin{matrix} {\varvec{\eta }}-\hat{{\varvec{\eta }}}\\ {\varvec{\beta }}^{(1)}-\hat{{\varvec{\beta }}}^{(1)}\\ {\varvec{\beta }}^{(2)}-\hat{{\varvec{\beta }}}^{(2)} \end{matrix} \right] ' \left[ \begin{matrix} A_0 &\quad &\quad \\ &\quad A_1 &\quad \\ &\quad &\quad A_2 \end{matrix} \right] \left[ \begin{matrix} {\varvec{\eta }}-\hat{{\varvec{\eta }}}\\ {\varvec{\beta }}^{(1)}-\hat{{\varvec{\beta }}}^{(1)}\\ {\varvec{\beta }}^{(2)}-\hat{{\varvec{\beta }}}^{(2)} \end{matrix} \right] , \end{aligned}$$

and it is equal to

$$\begin{aligned} \phi ({\varvec{\beta }}^{(1)},{\varvec{\beta }}^{(2)},{\varvec{\eta }},{\varvec{X}},{\varvec{y}},{\varvec{z}})=({\varvec{\eta }}-\hat{{\varvec{\eta }}})'{\varvec{A}}_0({\varvec{\eta }}-\hat{{\varvec{\eta }}})+\sum _{i=1}^2 ({\varvec{\beta }}^{(i)}-\hat{{\varvec{\beta }}}^{(i)})'{\varvec{A}}_i({\varvec{\beta }}^{(i)}-\hat{{\varvec{\beta }}}^{(i)}). \end{aligned}$$

(10)

From the theoretical result given in Sect. 2.2, our objective function is

$$\begin{aligned} \Phi ({\varvec{X}})= & \int \phi ({\varvec{\beta }}^{(1)},{\varvec{\beta }}^{(2)},{\varvec{\eta }},{\varvec{X}},{\varvec{y}},{\varvec{z}})p({\varvec{y}},{\varvec{z}}|{\varvec{X}}, {\varvec{\eta }}, {\varvec{\beta }}^{(1)}, {\varvec{\beta }}^{(2)})\nonumber \\&\times p({\varvec{\beta }}^{(1)},{\varvec{\beta }}^{(2)},{\varvec{\eta }}) \text {d}{\varvec{y}} \text {d}{\varvec{z}} \text {d}{\varvec{\beta }}^{(1)}\text {d}{\varvec{\beta }}^{(2)}\text {d}{\varvec{\eta }}. \end{aligned}$$

(11)

Note that ${\varvec{\beta }}^{(1)}$ and ${\varvec{\beta }}^{(2)}$ have conjugate prior distributions with respect to the normal distribution of ${\varvec{y}}$, so the integration in (11) with respect to them can be computed explicitly. However, the integration with respect to ${\varvec{\eta }}$ is not analytically tractable. So we first integrate (11) with respect to ${\varvec{\beta }}^{(1)}$ and ${\varvec{\beta }}^{(2)}$ and leave the part in (11) that cannot be integrated with respect to ${\varvec{\eta }}$ as a function of ${\varvec{\eta }}$. The design matrix ${\varvec{X}}$ constructed in this way is globally optimal with respect to ${\varvec{\beta }}^{(1)}$ and ${\varvec{\beta }}^{(2)}$, but only locally optimal with respect to ${\varvec{\eta }}$. So strictly speaking, it is a “semi-local” optimal design, as only part of the model parameters are fixed. However, we will still refer to it as the local optimal design for convenience.

To simplify the objective function (11), we introduce the following notation.

${\varvec{W}}_1({\varvec{\eta }})=\text{ diag }\{\pi ({\varvec{x}}_1,{\varvec{\eta }}),\ldots ,\pi ({\varvec{x}}_n,{\varvec{\eta }})\}$ and ${\varvec{W}}_2({\varvec{\eta }})={\varvec{I}}_n-{\varvec{W}}_1({\varvec{\eta }})$. For fixed ${\varvec{\eta }}$, we omit ${\varvec{\eta }}$ and simply use ${\varvec{W}}_1$ and ${\varvec{W}}_2$. Let ${\varvec{W}}_0({\varvec{\eta }})=\sigma ^2 {\varvec{W}}_1{\varvec{W}}_2$. Here ${\varvec{I}}_n$ denotes the $n\times n$ identity matrix.
${\varvec{S}}_k={\varvec{f}}({\varvec{x}}_k){\varvec{f}}({\varvec{x}}_k)^T$, for $k=1,\ldots ,n$.
For any vector ${\varvec{b}}\in {\mathbb {R}}^n$, and for $i=1,2$, we define the vector-to-matrix function ${\varvec{C}}_i({\varvec{b}}) = \sum _{k=1}^n b_k{\varvec{S}}_k+\rho {\varvec{R}}_i^{-1}.$

Theorem 1

Given the posterior distributions in Sect. 2.2 and the above notations, the A-optimal criterion function of the local optimal design with respect to a fixed value of ${\varvec{\eta }}$ is

$$\begin{aligned} \Phi ({\varvec{X}}|{\varvec{\eta }})=Q_0+Q_1+Q_2+\Delta _1+\Delta _2+E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)], \end{aligned}$$

(12)

where

$Q_i=\sigma ^2 \text{ tr }[{\varvec{A}}_i({\varvec{F}}'{\varvec{W}}_i{\varvec{F}}+\rho {\varvec{R}}_i^{-1})^{-1}], \text { for } i=0,1,2$,
$\Delta _i^2=\sigma ^2\sum _{k=1}^n\pi _k(1-\pi _k)\text{ tr }[{\varvec{A}}_i{\varvec{C}}_i^{-1}({\varvec{\pi }}){\varvec{S}}_k{\varvec{C}}_i^{-1}({\varvec{\pi }}){\varvec{S}}_k{\varvec{C}}_i^{-1}({\varvec{\pi }})], \text { for } i=1,2$,

The proof of Theorem 1 is given in “Appendix.” In practice, $E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]$ is small and negligible in computation, as demonstrated in simulations in “Appendix B.” So the sum $Q_0+Q_1+Q_2+\Delta _1+\Delta _2$ in (12) comprises an analytical utility function that well approximates the exact one. Therefore, we use the approximate as the objective function for local optimal design for given ${\varvec{\eta }}$,

$$\begin{aligned} \Phi ({\varvec{X}}|{\varvec{\eta }})\approx Q_0+Q_1+Q_2+\Delta _1+\Delta _2. \end{aligned}$$

Next, we transform the exact optimal design into a continuous optimal design problem. Denote ${\varvec{G}}_i({\varvec{X}})=\sqrt{{\varvec{W}}_i}{\varvec{F}}$ for $i=1, 2$, where $\sqrt{{\varvec{W}}_i}$ is the diagonal matrix whose diagonal elements are square roots of those in ${\varvec{W}}_i$. The kth row in ${\varvec{G}}_i({\varvec{X}})$ is $\sqrt{({\varvec{W}}_i)_{kk}}{\varvec{f}}'({\varvec{x}}_k)$, which we denote by ${\varvec{g}}_i'({\varvec{x}}_k)$. Thus, each component of (12) can be rewritten as:

$$\begin{aligned} Q_i({\varvec{X}}|{\varvec{\eta }})=\, & \sigma ^2\text{ tr }\{{\varvec{A}}_i[{\varvec{G}}_i'({\varvec{X}}){\varvec{G}}_i({\varvec{X}})+\rho {\varvec{R}}_i^{-1}]^{-1}\}\nonumber \\=\, & \sigma ^2\text{ tr }\left\{ {\varvec{A}}_i\left[ \sum _{k=1}^n {\varvec{g}}_i({\varvec{x}}_k){\varvec{g}}_i'({\varvec{x}}_k)+\rho {\varvec{R}}_i^{-1}\right] ^{-1}\right\} . \end{aligned}$$

(13)

Since $Q_i({\varvec{X}}|{\varvec{\eta }})$ has the row-additive format as suggested by (8), a continuous design formulation can be reached by rewriting the (13) as follows

$$\begin{aligned} Q_i({\varvec{p}}|{\varvec{\eta }})=\sigma ^2\text{ tr }\left\{ {\varvec{A}}_i\left[ \sum _{k=1}^N p_i{\varvec{g}}_i({\varvec{x}}_k){\varvec{g}}_i'({\varvec{x}}_k)+\rho {\varvec{R}}_i^{-1}\right] ^{-1}\right\} . \end{aligned}$$

(14)

Here N denotes the number of candidate design points as in Sect. 2.2. Similarly, one can easily find that $\Delta _1$ and $\Delta _2$ also take the row-additive format and write $\Delta _i$ in terms of ${\varvec{p}}$. Thus, the objective function can be rewritten as

$$\begin{aligned} \Phi ({\varvec{p}}|{\varvec{\eta }})&=\sum _{i=0}^2Q_i({\varvec{p}}|{\varvec{\eta }})+\sum _{i=1}^2 \Delta _i({\varvec{p}}|{\varvec{\eta }})+E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]\nonumber \\&\approx \sum _{i=0}^2Q_i({\varvec{p}}|{\varvec{\eta }})+\sum _{i=1}^2 \Delta _i({\varvec{p}}|{\varvec{\eta }}). \end{aligned}$$

(15)

As explained in Sect. 2.1, once ${\varvec{p}}^*$ is found by minimizing (15), the local exact optimal design can be constructed by randomly sampling n design points from the candidate points according to ${\varvec{p}}^*$. The global optimal design can be constructed based on the construction of the local optimal design. We propose new optimization algorithms for both local and global optimal designs in Sect. 3.

3 Optimization Search

The utility function (15) has a complex form. We need to consider the exploration-vs-exploitation trade-off when developing the optimization algorithm. Due to the potential multimodality of the objective function from its complexity, space exploration algorithms should be more effective than local search and gradient-based algorithms. For that reason, we choose the particle swarm optimization (PSO) to minimize (15). This algorithm will be reviewed in Sect. 3.1. When N is large, to tackle with the difficulty of high dimensionality, the idea of combining PSO with blocked coordinate descent method is proposed, which allows us to decompose a high-dimensional problem into the dimensional one. This approach is described in Sect. 3.2. Both Sects. 3.1 and 3.2 deal with the local optimal design for a fixed value of ${\varvec{\eta }}$. We discuss how to find the global optimal design for the entire domain of ${\varvec{\eta }}$ in Sect. 3.3.

3.1 Particle Swarm Optimization

The idea of PSO is summarized here. Suppose one is interested in finding the minimum of a real-valued function $\Phi ({\varvec{p}}|{\varvec{\eta }})$. At first, M feasible solutions in the domain of $\Phi (\cdot |{\varvec{\eta }})$, denoted by ${\varvec{p}}_1,\ldots ,{\varvec{p}}_M$, are randomly selected as the starting points, which are called particles. For each $i=1,\ldots ,M$, let ${\varvec{p}}_i(0)={\varvec{p}}_i$, and ${\varvec{v}}_i(0)=0$. The iterations are performed for each $i=1,\ldots ,M$ based on the following scheme

$$\begin{aligned} {\varvec{p}}_i(t+1)={\varvec{p}}_i(t)+{\varvec{v}}_i(t+1),\end{aligned}$$

(16)

$$\begin{aligned} \text {where } {\varvec{v}}_i(t+1) = \alpha _1I\cdot {\varvec{v}}_i(t) + \alpha _2 C\cdot ({\varvec{p}}_i(t)-{\varvec{p}}_i^*(t)) + \alpha _3S\cdot ({\varvec{p}}_i(t)-{\varvec{g}}^*(t)). \end{aligned}$$

(17)

The notation in (16) and (17) are explained below.

${\varvec{p}}_i(t)$ is the value of particle i at iteration t.
${\varvec{v}}_i(t)$ is the rate of change (or speed) of particle i at iteration t.
${\varvec{p}}_i^*(t)$ is the optimal value that particle i has achieved before iteration t, which is the historical minimum of particle i.
${\varvec{g}}^*(t)$ is the optimal value, or historical minimum of all particles up to iteration t.
I, C, S are three constants, and the letters are initials for inertia, cognitive and society.
$\alpha _1, \alpha _2, \alpha _3$ are random variables uniformly distributed on [0, 1].

The updating scheme in (17) indicates that the iteration step and the direction of particle i, which are contained in ${\varvec{v}}_i(t+1)$, depend on 3 components: (1) the inertia factor, which is the current speed rate; (2) the cognitive factor, which represents the memory of the particle; (3) the society factor, which represents the information shared by all the particles.

Since many particles are searching for the optimal solution together, and the searching path of every particle relies on the paths of others, PSO not only allows exploration over the space but also simultaneously exploits many positions with the search paths. It can avoid being trapped at a local optimum, and the convergence is much faster than the simple random search. This explains why this method is powerful in finding the global optimum. Applying PSO to finding the optimal solution of (14), we obtain Algorithm 1.

Algorithm 1

Particle swarm optimization for finding the optimal ${\varvec{p}}$

Initialization.

Start from M initial guesses of particles ${\varvec{p}}_1,\ldots ,{\varvec{p}}_M$.

For each $i=1,\ldots ,M$, let ${\varvec{p}}_i(0)={\varvec{p}}_i, {\varvec{v}}_i(0)={\varvec{0}}$.

Evaluate $\Phi ({\varvec{p}}_i(0)|{\varvec{\eta }})$ and let ${\varvec{p}}_i^*(0)={\varvec{p}}_i(0)$ for each $i=1,\ldots ,M$.

Identify ${\varvec{g}}_i^*(0)= \arg \min _{{\varvec{p}}_i(0)}\Phi ({\varvec{p}}|{\varvec{\eta }})$.
Iteration. At each iteration step $t=1,2,\ldots$,
- Evaluate the current objective value $\Phi ({\varvec{p}}_i(t)|{\varvec{\eta }})$ for each particle $i=1,\ldots ,M$.
- For each particle ${\varvec{p}}_i$, compare its current objective value with previous ones. Let ${\varvec{p}}_i^*(t)=\arg \min _{\{{\varvec{p}}_i(t),{\varvec{p}}_i^*(t-1)\}}\Phi ({\varvec{p}}|{\varvec{\eta }})$.
- Obtain the current global best particle that reaches $\min \{\Phi ({\varvec{p}}_1|{\varvec{\eta }}),\ldots ,\Phi ({\varvec{p}}_M|{\varvec{\eta }})\}$, and denote it by ${\varvec{p}}^*(t)$. Let ${\varvec{g}}^*(t)=\arg \min _{\{{\varvec{g}}^*(t-1),{\varvec{p}}^*(t)\}}\Phi ({\varvec{p}}|{\varvec{\eta }})$.
- Compute ${\varvec{v}}_i(t+1)$ for each $i=1,\ldots ,M$ according to (17).
- Compute ${\varvec{p}}_i(t+1)$ for each $i=1,\ldots ,M$ according to (16).
Repeat the above until convergence or the maximum number of iterations is reached.
Finalization.

Denote T to be the last iteration step. Return ${\varvec{g}}^*(T)$ as the optimal solution and $\Phi ({\varvec{g}}^*(T)|{\varvec{\eta }})$ as the optimal objective value.

3.2 Combine with Blocked Coordinate Descent

When N, the length of ${\varvec{p}}$, is large, Algorithm 1 still works but becomes problematic. Due to the high dimensionality of the decision variable, PSO needs a large number of particles M to sufficiently explore the solution space, and the number of particles M needs to grow exponentially with respect to the dimensionality. The computation required can easily become prohibitive if the candidate design points are many. For that reason, we propose Algorithm 2 to combine particle swarm optimization with blocked coordinate descent, to reduce the dimension of search space at each iteration.

Algorithm 2

PSO combined with blocked coordinate descent

Initialization.

Start from an initial guess ${\varvec{p}}$.

Divide its N coordinates into B blocks: ${\varvec{p}}=({\varvec{p}}_1',{\varvec{p}}_2',\ldots ,{\varvec{p}}_B')'$.
Iteration.
- Obtain M particles by replacing ${\varvec{p}}_1$ with M different random guesses, but with ${\varvec{p}}_2, \ldots , {\varvec{p}}_B$ kept the same.
- Perform Algorithm 1 to find the optimal ${\varvec{p}}_1^*$.
- Now let ${\varvec{p}}=({\varvec{p}}_1^{*'}, {\varvec{p}}_2',\ldots ,{\varvec{p}}_B')$. Repeat the previous two steps for ${\varvec{p}}_2, {\varvec{p}}_3, \ldots , {\varvec{p}}_B$.
Repeat the above until convergence or the maximum number of iterations is reached.
Finalization.

Return the final updated ${\varvec{p}}$ as the optimal solution and compute $\Phi ({\varvec{p}}|{\varvec{\eta }})$ as the optimal objective value.

Note that when $B=N$, the above algorithm is the ordinary (instead of blocked) coordinate descent method. By combining PSO and BCD, Algorithm 2 not only resolves the issue of high dimensionality but also leads to better performance measured by optimal value reached and solution stability, as can be seen in the simulation examples in the next section. Intuitively, while PSO emphasizes more on spatial exploration, BCD focuses more on the local search of the area. They two combined together provide a better exploration-versus-exploitation trade-off.

3.3 Finding the Global Optimum on the Entire Domain of ${\varvec{\eta }}$

In this subsection, we discuss the way to find a global optimal ${\varvec{p}}$ for the entire range of ${\varvec{\eta }}$. Two approaches will be considered: (1) the average approach; (2) the minimax approach.

The idea of the average approach is simple. To start with, one first simulates a sample ${\varvec{\eta }}_1,\ldots ,{\varvec{\eta }}_K$ according to the prior distribution of ${\varvec{\eta }}$, then find the local optimal ${\varvec{p}}^{(k)}={\varvec{p}}({\varvec{\eta }}_k)$ for each ${\varvec{\eta }}_k$. The global optimal ${\varvec{p}}^*$ is simply the sample mean of all local optimal design frequency vectors. This is summarized in Algorithm 3.

Algorithm 3

Find the averaged global optimal design

Initialization

Simulate ${\varvec{\eta }}_1,{\varvec{\eta }}_2,\ldots ,{\varvec{\eta }}_K$ according to the prior distribution ${\varvec{\eta }}\sim N(0,\tau ^2R_0)$.
Iteration

For each ${\varvec{\eta }}_k,k=1,\ldots ,K$, find the corresponding local optimal ${\varvec{p}}^{(k)}$ with Algorithm 1 when the dimension of ${\varvec{p}}$ is low, or with Algorithm 2 when the dimension of ${\varvec{p}}$ is high.
Finalization

Return ${\varvec{p}}^*=\frac{1}{K}\sum _{k=1}^K {\varvec{p}}^{(k)}$ as the averaged global optimal solution.

The idea of the minimax approach is proposed by [6], and the goal here is to find a design that protects against the worst-case scenario. Different from the previous approach, the objective function to evaluate the design here is $\max _{{\varvec{\eta }}}\Phi ({\varvec{p}}|{\varvec{\eta }})$, where $\Phi ({\varvec{p}}|{\varvec{\eta }})$ is defined by (15). The global optimal ${\varvec{p}}^*$ that determines the design will be found by

$$\begin{aligned} {\varvec{p}}^* = \min _{{\varvec{p}}}\max _{{\varvec{\eta }}}\Phi ({\varvec{p}}|{\varvec{\eta }}). \end{aligned}$$

(18)

The design ${\varvec{p}}^*$ is “minimax” design or “optimal worst-case design.” The performance of any design vector ${\varvec{p}}$ is evaluated by its worst-case scenario, namely the situation resulted by the value of ${\varvec{\eta }}$ that leads to the lowest quality of that design. Then the global optimal design ${\varvec{p}}^*$ is the design that has the best “lowest quality,” found by optimizing over all these worst-case scenarios. The minimax problem (18) can be solved iteratively over ${\varvec{p}}$ and ${\varvec{\eta }}$ consecutively until convergence, and we summarize it in Algorithm 4.

Algorithm 4

Find the minimax global optimal design

Initialization

Start by setting ${\varvec{p}}(0)$ as some random vector with proper dimension.
Iteration At each iteration step $t=1,2$
- Solve
  $$\begin{aligned} {\varvec{\eta }}(t)=\arg \max _{{\varvec{\eta }}}\Phi ({\varvec{p}}(t-1)|{\varvec{\eta }}) \end{aligned}$$
  (19)
- Find ${\varvec{p}}(t)=\arg \min _{p}\Phi ({\varvec{p}}|{\varvec{\eta }}(t))$. This is a local optimal design problem and can be solved by either Algorithm 1 or 2.
Repeat the above until convergence or maximum number of iterations is reached.
Finalization

Denote T the last iteration step. Return ${\varvec{p}}(T)$.

The maximization involved in (19) can be solved by standard gradient-based methods or by applying PSO again as [6] suggested. However, our experience indicates that a random search or grid search with a reasonably large number of random samples or grid points is usually sufficient.

4 Simulation Study

We first consider the case when N is small so that a comparison between Algorithms 1 and 2 can be made clear, and this is illustrated in Sect. 4.1. A high-dimensional design problem is considered in Sect. 4.2 for illustrating the effectiveness of our methods. Finally, using the parametric setting in Sect. 4.1 and the two approaches described in Sect. 3.3, an example of constructing the global optimal design is provided in Sect. 4.3.

4.1 Local Optimal Design: an Example with Small Candidate Size

We consider the case when there are only two two-level factors, with levels $x_1\in \{-0.5,0.5\}$ and $x_2\in \{-\sqrt{2},\sqrt{2}\}$, respectively, and one three-level factor with levels $x_3\in \{-1,0,1\}$. The candidate matrix ${\varvec{X}}^c$ contains full factorial combinations of the three factors, and thus, $N=2^2\times 3^1=12$. Here we are only concerned with the continuous design ${\varvec{p}}$ rather than the exact design. We assume the linear model contains all the linear and interaction terms $\{1,x_1,x_2,x_3,x_1x_2,x_2x_3,x_1x_3\}$. The parameters are set to be $\rho =1.2, \sigma ^2=2, r_0=0.3,r_1=0.5,r_2=0.4$. The matrices that define the A-optimal design criterion are ${\varvec{A}}_0={\varvec{A}}_1={\varvec{A}}_2=\text{ diag }\{1,0.5,1,0.5,1,0.5,1\}$. We find the local optimal design with ${\varvec{\eta }}$ fixed at $[1,1,2,3,4,5,6]'$.

We first test Algorithm 1, in which $M=1000$ particles and 50 swarms are used. We set $I=0.5, C=1, S=0.5$ in Algorithm 1, as both literature and our experience suggest that this setting of PSO parameters yields good performance in many circumstances. The searching path is shown in Fig. 1a. The iteration converges after 20 swarms.

Next, we try Algorithm 2 with the block size set to be 12. This means the ordinary coordinate descent algorithm is used. Since there is only one unknown to be solved for each PSO search according to Algorithm 2, we use only 100 particles. We allocate 50 swarms for the of each coordinate, thus one iteration of an entire ${\varvec{p}}$ takes $50\times 12=600$ steps. The searching paths for the first 4 iterations are plotted in Fig. 1b. One can easily see that the search converges only after one iteration.

Note that the optimal solution found by Algorithm 2 is smaller than Algorithm 1 shown in Fig. 1a. This is because more local search is done by separately exploiting each coordinate. More computation time could be required. But the increased computation due to switching from Algorithm 1 to 2 might not be too significant, since the original optimization problem is decomposed into a few (in this case 12) lower-dimensional problems, and the number of particles required in PSO by Algorithm 2 is significantly smaller than Algorithm 1. In our Matlab implementation of this example on a laptop with 16 GB RAM and Intel Core i7e CPU, Algorithm 1 with 1000 particles and 50 swarms takes 45.51 s, while Algorithm 2 with 100 particles and 50 swarms takes 59.58 s.

To test whether the solutions found by Algorithms 1 and 2 are stable, both simulations are repeated for 100 times. The 12 dimensions of the solutions found are shown in the boxplot in Fig. 2. We found that the boxes in Fig. 2b is much narrower than those in Fig. 2a, indicating that solutions found by Algorithm 2 are more stable than Algorithm 1. This makes sense because PSO is used to look for an optimal solution in a 1-dim space in Algorithm 2 one coordinate at a time, whereas PSO is used to find an optimal solution in a 12-dim space. To sum up, even for this small example, we can see that Algorithm 2 performs better in terms of both objective value and solution stability than Algorithm 1.

4.2 Local Optimal Design: an Example with Large Candidate Size

To show how Algorithms 1 and 2 perform for high-dimensional cases, a design problem with larger candidate set size is considered. In this example, we assume that there are four two-level factors, with levels $x_1\in \{-0.5,0.5\}$ and $x_2\in \{-\sqrt{2},\sqrt{2}\}$, $x_3\in \{-1,1\}$, $x_4\in \{0,1\}$, respectively, and two three-level factors with levels $x_5\in \{-1,0,1\}$, $x_6\in \{1,\sqrt{2},2\}$. The number of candidate design points of all factorial combinations is $N=2^4\times 3^2=144$. The linear regression model contains the terms $\{1,x_1,x_2,x_3,x_4,x_5,x_6,x_1x_3,x_1x_4,x_2x_5,x_2x_6\}$. The parameters are set to be the same values as in Sect. 4.1, $\rho =1.2, \sigma ^2=2, r_0=0.3,r_1=0.5,r_2=0.4$. The matrices ${\varvec{A}}_0={\varvec{A}}_1={\varvec{A}}_2=\text{ diag }\{1,0.5,1,0.5,1,0.5,1,0.5,1,0.5,1\}$. We seek the local optimal design with ${\varvec{\eta }}$ fixed at $[1,1,2,3,4,5,2.5,4.5,3.8,1.9,2.1]'$.

We consider three different implementations for this example: (a) PSO search only, using Algorithm 1; (b) blocked coordinate descent, using Algorithm 2 with number of blocks to be $B=12$, in which case there are $144/B=12$ coordinates to be solved by each PSO search; (c) ordinary coordinate descent using Algorithm 2 with $B=144$, in which case there is only one coordinate to be solved by each PSO search. For Case (a), we use 10,000 particles and 50 swarms in implementing PSO. For Case (b), we use 1000 particles and 50 swarms when applying PSO to each block of 12 coordinates. For Case (c), since there is only one coordinate in each PSO search, we use only 100 particles and 20 swarms.

The searching paths for the above three approaches are shown in Fig. 3. The optimal values achieved in Fig. 3b, c are smaller than that in Fig. 3a, indicating again that combining PSO and BCD together yields better performance than relying solely on PSO. Furthermore, the optimal value reached in Fig. 3c is smaller than that in Fig. 3b, which suggests that smaller block size leads to better performance in Algorithm 2. As for the computation time of the three cases, using the same laptop mentioned in Sect. 4.1, Cases (a), (b) and (c) take respectively 7286 s, 10747 s and 3695 s. Therefore, Case (c) yields the best result but takes the least amount of time, again showing the advantage of decomposing the original high-dimensional optimization problem into low-dimensional ones.

All of the three implementations are repeated 30 times. The boxplots of the local optimal designs are shown in Fig. 4, from which we can see that the most stable implementation is ordinary coordinate descent as shown in Fig. 4c, and the most unstable one is pure PSO search as shown in Fig. 4a. The stability of blocked coordinate descent is in between as shown in Fig. 4b.

4.3 An Example for the Global Optimal Design

In this example, we use the same settings as in Sect. 4.1 and find the global optimal design using Algorithms 3 and 4. We set $\tau ^2=\frac{\sigma ^2}{\rho }=1.67$, $r_0=0.3$. Recall the prior distribution of ${\varvec{\eta }}$ is $N({\varvec{0}},\tau ^2 R_0)$, where $R_0=\text{ diag }\{1, r_0,r_0,r_0,r_0^2,r_0^2,r_0^2\}$.

To find the averaged global optimal design using Algorithm 3, we generate a sample of ${\varvec{\eta }}$ with size 1000 from the prior distribution. For each ${\varvec{\eta }}$ in this sample, a local optimal ${\varvec{p}}$ is found. The distributions of each of the 12 dimensions of these local optima are plotted in a violin plot as shown in Fig. 5. The averaged global optimal ${\varvec{p}}^*$ obtained by Algorithm 3 is the mean of these local optimal, whose coordinates are denoted by the black dots in this figure. In addition, the red dots denote the coordinates of the minimax global optimal ${\varvec{p}}^*$ by Algorithm 4. For the optimization in (19), we used random search optimization with a sample of ${\varvec{\eta }}$ with size 1000 to in each iteration, and the random samples of ${\varvec{\eta }}$ are again sampled from the above prior distribution. We observe that the two global optimal designs are very different, which is expected.

5 Summary and Discussion

In this article, we have derived the Bayesian A-optimal design for the Bayesian QQ model which is used to model data with both qualitative and quantitative response variables. We have also developed an efficient algorithm for the local optimal design, which combines the particle swarm optimization and blocked coordinate descent methods. Two different global optimal designs are discussed based on the local optimal design. The design algorithm can be generalized to solve other design problems that have complex design criterion and high-dimensional decision variables. Other design algorithms, such as cocktail algorithm [24] and the general and efficient algorithm proposed by [25], might be adapted to obtain the local optimal design. We plan to pursue in this direction for more generalized Bayesian QQ model in the future.

Notes

Linear-optimal (or L-optimal) criterion is defined by a linear function of the information matrix. For linear regression model with the model matrix ${\varvec{F}}$, the information matrix ${\varvec{M}}=({\varvec{F}}'{\varvec{F}})^{-1}$, and the L-optimal criterion is $L({\varvec{M}})$, and $L(\cdot )$ is a linear function. A-optimality is a special case of L-optimality because $L({\varvec{M}})=\text{ tr }({\varvec{A}}\times {\varvec{M}})$ is a linear function of ${\varvec{M}}$. More details can be found in [22].
We have used the following formulas of matrix-to-scalar derivative in our calculations: $\frac{\partial \text{ tr }(U)}{\partial x}=\text{ tr }(\frac{\partial U}{\partial x})$,$\frac{\partial UV}{\partial x}=U\frac{\partial V}{\partial x}+\frac{\partial U}{\partial x}V$, $\frac{\partial U^{-1}}{\partial x}=-U^{-1}\frac{\partial U}{\partial x}U^{-1}$.

References

Deng X, Jin R (2015) QQ models: joint modeling for quantitative and qualitative quality responses in manufacturing systems. Technometrics 57(3):320–331
Article MathSciNet Google Scholar
Kang L, Kang X, Deng X, Jin R (2018) A Bayesian hierarchical model for quantitative and qualitative responses. J Qual Technol 50(3):290–308
Article Google Scholar
Kang L (2016) Bayesian d-optimal design of experiments with continuous and binary responses. In: International conference on design of experiments (ICODOE-2016)
Eberhart RC, Kennedy J et al (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science, vol 1, pp 39–43. New York, NY
Lukemire J, Mandal A, Wong WK (2016) Using particle swarm optimization to search for locally $d$-optimal designs for mixed factor experiments with binary response. arXiv preprintarXiv:1602.02187
Chen R-B, Chang S-P, Wang W, Tung H-C, Wong WK (2015) Minimax optimal designs via particle swarm optimization methods. Stat Comput 25(5):975–988
Article MathSciNet Google Scholar
Wong WK, Chen R-B, Huang C-C, Wang W (2015) A modified particle swarm optimization technique for finding optimal designs for mixture models. PLoS ONE 10(6):e0124720
Article Google Scholar
Chen R-B, Hsu Y-W, Hung Y, Wang W (2014) Discrete particle swarm optimization for constructing uniform design on irregular regions. Comput Stat Data Anal 72:282–297
Article MathSciNet Google Scholar
Leatherman E, Dean A, Santner T (2014) Computer experiment designs via particle swarm optimization. In: Topics in statistical simulation, pp 309–317. Springer
Kang L, Joseph VR (2012) Bayesian optimal single arrays for robust parameter design. Technometrics
Wu CFJ, Hamada MS (2011) Experiments: planning, analysis, and optimization, vol 552. Wiley, Hoboken
MATH Google Scholar
Ai M, Kang L, Joseph VR (2009) Bayesian optimal blocking of factorial designs. J Stat Plan Inference 139(9):3319–3328
Article MathSciNet Google Scholar
Joseph VR (2006) A Bayesian approach to the design and analysis of fractionated experiments. Technometrics 48(2):219–229
Article MathSciNet Google Scholar
Lindley DV (1972) Bayesian statistics: a review, vol. 2. SIAM, Philadelphia
Chaloner K, Verdinelli I (1995) Bayesian experimental design: a review. Stat Sci 10(3):273–304
Article MathSciNet Google Scholar
Ryan EG, Drovandi CC, McGree JM, Pettitt AN (2016) A review of modern computational algorithms for Bayesian optimal design. Int Stat Rev 84(1):128–154
Article MathSciNet Google Scholar
Overstall AM, Woods DC (2017) Bayesian design of experiments using approximate coordinate exchange. Technometrics 59(4):458–470
Article MathSciNet Google Scholar
Drovandi CC, Tran M-N et al (2018) Improving the efficiency of fully Bayesian optimal design of experiments using randomised quasi-monte carlo. Bayesian Anal 13(1):139–162
Article MathSciNet Google Scholar
Woods DC, Overstall AM, Adamou M, Waite TW (2017) Bayesian design of experiments for generalized linear models and dimensional analysis with industrial and scientific application. Qual Eng 29(1):91–103
Google Scholar
Alexanderian A, Gloor PJ, Ghattas O et al (2016) On Bayesian a- and d-optimal experimental designs in infinite dimensions. Bayesian Anal 11(3):671–695
Article MathSciNet Google Scholar
Huan X, Marzouk YM (2013) Simulation-based optimal Bayesian experimental design for nonlinear systems. J Comput Phys 232(1):288–317
Article MathSciNet Google Scholar
Fedorov VV (1972) Theory of optimal experiments (translated by W. J. Studden and E. M. Klimko, eds.). Academic Press, New York
Kiefer J (1985) Collected papers, vol 3. Surendra Kumar
Yu Y (2011) D-optimal designs via a cocktail algorithm. Stat Comput 21(4):475–481
Article MathSciNet Google Scholar
Yang M, Biedermann S, Tang E (2013) On optimal designs for nonlinear models: a general and efficient algorithm. J Am Stat Assoc 108(504):1411–1420
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported by U.S. National Science Foundation Grants CMMI-1435902.

Author information

Authors and Affiliations

Department of Applied Mathematics, Illinois Institute of Technology, Chicago, IL, 60616, USA
Lulu Kang & Xiao Huang

Authors

Lulu Kang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lulu Kang.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Part of special issue guest edited by Pritam Ranjan and Min Yang—Algorithms, Analysis and Advanced Methodologies in the Design of Experiments.

Appendices

1.1 A Proof of Theorem 1

For the A-optimal design, the design criterion is

$$\begin{aligned} \Phi ({\varvec{X}})&=\int \phi ({\varvec{\beta }}^{(1)},{\varvec{\beta }}^{(2)},{\varvec{\eta }},{\varvec{X}},{\varvec{y}},{\varvec{z}})p({\varvec{y}},{\varvec{z}}|{\varvec{X}},{\varvec{\beta }}^{(1)},{\varvec{\beta }}^{(2)},{\varvec{\eta }})\\&\quad \times \, p({\varvec{\beta }}^{(1)}, {\varvec{\beta }}^{(2)},{\varvec{\eta }})\text {d}{\varvec{\beta }}^{(1)}\text {d}{\varvec{\beta }}^{(2)}\text {d}{\varvec{\eta }} \text {d}{\varvec{y}} \text {d}{\varvec{z}}\\&=\int ({\varvec{\eta }}-\hat{{\varvec{\eta }}})'{\varvec{A}}_0({\varvec{\eta }}-\hat{{\varvec{\eta }}}) p({\varvec{z}}, {\varvec{\eta }}|{\varvec{X}})\text {d}{\varvec{z}} \text {d}{\varvec{\eta }}\\&\quad +\sum _{i=1}^2 \int ({\varvec{\beta }}^{(i)}-\hat{{\varvec{\beta }}}^{(i)})'{\varvec{A}}_i({\varvec{\beta }}^{(i)}-\hat{{\varvec{\beta }}}^{(i)}) p({\varvec{\beta }}^{(i)}|{\varvec{X}},{\varvec{y}},{\varvec{z}})\\&\quad \times \, p({\varvec{y}}|{\varvec{X}},{\varvec{z}}) p({\varvec{z}}, {\varvec{\eta }}|{\varvec{X}})\text {d}{\varvec{\beta }}^{(i)}\text {d}{\varvec{y}} \text {d}{\varvec{z}} \text {d}{\varvec{\eta }} \\&=\int \text{ tr }[{\varvec{A}}_0\cdot \text{ var }({\varvec{\eta }}|{\varvec{X}},{\varvec{z}})] p({\varvec{z}}|{\varvec{X}})\text {d}{\varvec{z}} \\&\quad + \sum _{i=1}^2\int \text{ tr }[{\varvec{A}}_i\cdot \text{ var }({\varvec{\beta }}^{(i)}|{\varvec{X}},{\varvec{y}},{\varvec{z}})]p({\varvec{y}}|{\varvec{X}},{\varvec{z}}) p({\varvec{\eta }}|{\varvec{X}},{\varvec{z}}) p({\varvec{z}}|{\varvec{X}}) \text {d}{\varvec{\eta }} \text {d}{\varvec{y}} \text {d}{\varvec{z}}. \end{aligned}$$

It can be easily seen that

$$\begin{aligned} \text{ var }({\varvec{\eta }}|{\varvec{X}},{\varvec{z}})&=\left\{ \sum _{i=1}^n \pi ({\varvec{x}}_i, {\varvec{\eta }})(1-\pi ({\varvec{x}}_i, {\varvec{\eta }})){\varvec{f}}({\varvec{x}}_i){\varvec{f}}({\varvec{x}}_i)'+\tau ^{-2}{\varvec{R}}_0^{-1}\right\} ^{-1}\\&=\left\{ {\varvec{F}}'{\varvec{W}}_1{\varvec{W}}_2{\varvec{F}}+\tau ^{-2}{\varvec{R}}_0^{-1}\right\} ^{-1}=\sigma ^2\left\{ {\varvec{F}}'{\varvec{W}}_0{\varvec{F}}+\rho {\varvec{R}}_0^{-1}\right\} ^{-1},\\&\quad \times \,\int \text{ tr }[{\varvec{A}}_0\cdot \text{ var }({\varvec{\eta }}|{\varvec{X}},{\varvec{z}})]p({\varvec{z}}|{\varvec{X}})\text {d}{\varvec{z}} \\&=\int \text{ tr }[{\varvec{A}}_0\cdot \sigma ^2\left\{ {\varvec{F}}'{\varvec{W}}_0{\varvec{F}}+\rho {\varvec{R}}_0^{-1}\right\} ^{-1}] p({\varvec{z}}|{\varvec{X}}, {\varvec{\eta }})p({\varvec{\eta }}) \text {d}{\varvec{z}} \text {d}{\varvec{\eta }}\\&=\sigma ^2\int \text{ tr }[{\varvec{A}}_0\cdot \left\{ {\varvec{F}}'{\varvec{W}}_0{\varvec{F}}+\rho {\varvec{R}}_0^{-1}\right\} ^{-1}] p({\varvec{\eta }}) \text {d}{\varvec{\eta }}=Q_0. \end{aligned}$$

Also, in the last equation of $\Phi ({\varvec{X}})$, the integration with respect to ${\varvec{\eta }}$ in the last two terms cannot be computed explicitly. So for the local optimal design, we omit the integration with respect to ${\varvec{\eta }}$ and instead let the objective function depend on ${\varvec{\eta }}$.

Thus the objective function for the local optimal design is

$$\begin{aligned} \Phi ({\varvec{X}}|{\varvec{\eta }}) = Q_0+\sum _{i=1}^2E_{{\varvec{z}}}\{\text{tr }[{\varvec{A}}_i \cdot \text{ var }({\varvec{\beta }}^{(i)}|{\varvec{X}},{\varvec{y}},{\varvec{z}})]\}. \end{aligned}$$

(20)

According to the conditional posterior distribution of ${\varvec{\beta }}_i^{(i)}$,

$$\begin{aligned} \text{ var }({\varvec{\beta }}^{(i)}|{\varvec{X}},{\varvec{y}},{\varvec{z}})=\sigma ^2({\varvec{F}}'{\varvec{V}}_1{\varvec{F}}+\rho {\varvec{R}}_i^{-1})^{-1}. \end{aligned}$$

In Lemma 1, we prove the following

$$\begin{aligned}&\sigma ^2 E_{{\varvec{z}}}\{\text{tr }[{\varvec{A}}_i({\varvec{F}}'{\varvec{V}}_i{\varvec{F}}+\rho {\varvec{R}}_i^{-1})^{-1}] \} \nonumber \\&\quad =\sigma ^2 \text{ tr }[{\varvec{A}}_i({\varvec{F}}'{\varvec{W}}_i{\varvec{F}}+\rho {\varvec{R}}_i^{-1})^{-1}]+\Delta _i^2+E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]. \end{aligned}$$

(21)

Here ${\varvec{V}}_1=\text{ diag }\{{\varvec{z}}\}$, ${\varvec{V}}_2={\varvec{I}}_n-{\varvec{V}}_1$, ${\varvec{W}}_1=\text{ diag }\{{\varvec{\pi }}\}$,${\varvec{W}}_2={\varvec{I}}_n-{\varvec{W}}_1$, and

$$\begin{aligned} \Delta _i^2\triangleq \sigma ^2\sum _{k=1}^n\pi _k(1-\pi _k)\text{ tr }[{\varvec{A}}_i{\varvec{C}}_i^{-1}({\varvec{\pi }}){\varvec{S}}_k{\varvec{C}}_i^{-1}({\varvec{\pi }}){\varvec{S}}_k{\varvec{C}}_i^{-1}({\varvec{\pi }})]. \end{aligned}$$

With this result, we can reach the result in Theorem 1.

Lemma 1

Define function $Q_i({\varvec{b}}) = \sigma ^2\text{ tr }[{\varvec{A}}_i{\varvec{C}}_i^{-1}({\varvec{b}})]$. Then (21) can be rewritten as

$$\begin{aligned} E_{{\varvec{z}}}Q_i({\varvec{z}})=Q_i({\varvec{\pi }})+\Delta _i^2+E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)] \end{aligned}$$

(22)

Proof

Performing second-order Taylor expansion to $Q_1({\varvec{z}})$ at ${\varvec{z}}={\varvec{\pi }}$, we have

$$\begin{aligned} Q_i({\varvec{z}})&=Q_i({\varvec{\pi }})+\sum _{k=1}^n\left. \frac{\partial Q_i({\varvec{z}})}{\partial z_k}\right| _{{\varvec{z}}={\varvec{\pi }}} (z_k-\pi _k) \nonumber \\&\quad +\frac{1}{2}\sum _{j=1}^n\sum _{k=1}^n \left. \frac{\partial ^2 Q_i({\varvec{z}})}{\partial z_j\partial z_k}\right| _{{\varvec{z}}={\varvec{\pi }}}(z_j-\pi _j)(z_k-\pi _k)+o(||{\varvec{z}}-{\varvec{\pi }}||^2). \end{aligned}$$

(23)

Taking expectation to (23) yields

$$\begin{aligned} E_{{\varvec{z}}}Q_i({\varvec{z}})=Q_i({\varvec{\pi }})+\frac{1}{2}\sum _{k=1}^n \pi _k(1-\pi _k) \left. \frac{\partial ^2 Q_i({\varvec{z}})}{\partial z_k^2}\right| _{{\varvec{z}}={\varvec{\pi }}} +E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]. \end{aligned}$$

(24)

It remains to compute $\frac{\partial ^2 Q_i({\varvec{z}})}{\partial z_k^2}$. We have^{Footnote 2}

$$\begin{aligned} \frac{\partial Q_i({\varvec{z}})}{\partial z_k}=\,&\sigma ^2\text{ tr }\left[ {\varvec{A}}_i\frac{\partial {\varvec{C}}_i^{-1}({\varvec{z}})}{\partial z_k}\right] \\ =&-\sigma ^2\text{ tr }\left[ {\varvec{A}}_i{\varvec{C}}_i^{-1}({\varvec{z}})\frac{\partial {\varvec{C}}_i({\varvec{z}})}{\partial z_k}{\varvec{C}}_i^{-1}({\varvec{z}})\right] \\ =&-\sigma ^2\text{ tr }[{\varvec{A}}_i{\varvec{C}}_i^{-1}({\varvec{z}}){\varvec{S}}_k{\varvec{C}}_i^{-1}({\varvec{z}})], \end{aligned}$$

and

$$\begin{aligned} \frac{\partial Q_i^2({\varvec{z}})}{\partial z_k^2}=&-\sigma ^2\text{ tr }\left[ {\varvec{A}}_i\frac{\partial ({\varvec{C}}_i^{-1}({\varvec{z}}){\varvec{S}}_k{\varvec{C}}_i^{-1}({\varvec{z}}))}{\partial z_k}\right] \\ =&-\sigma ^2\text{ tr }\left[ {\varvec{A}}_i\frac{\partial {\varvec{C}}_i^{-1}({\varvec{z}})}{\partial z_k}{\varvec{S}}_k{\varvec{C}}_i^{-1}({\varvec{z}})+{\varvec{A}}_i{\varvec{C}}_i^{-1}({\varvec{z}}){\varvec{S}}_k\frac{\partial {\varvec{C}}_i^{-1}({\varvec{z}})}{\partial z_k}\right] \\ =\,&2\sigma ^2\text{ tr }[{\varvec{A}}_i{\varvec{C}}_i^{-1}({\varvec{\pi }}){\varvec{S}}_k{\varvec{C}}_i^{-1}({\varvec{\pi }}){\varvec{S}}_k{\varvec{C}}_i^{-1}({\varvec{\pi }})]. \end{aligned}$$

Substituting this into (24) yields (22) and (21). This completes the proof. $\square$

B Practical Justification of the Negligibility of $E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]$

In this section, we show that $E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]$ is negligible in general computation setting, and thus, ignoring $E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]$ in (12) still comprises a good approximation of the exact objective function.

First, we generate $M=1000$ random samples of ${\varvec{z}}_m$’s from the distribution (4) for a given ${\varvec{\eta }}$ value and compute $\text{ tr }[{\varvec{A}}_i\cdot \text{ var }({\varvec{\beta }}^{(i)}|{\varvec{X}},{\varvec{y}},{\varvec{z}}_m)]$. Then, we replace $E_{{\varvec{z}}}\{\text{tr }[A\cdot \text{ var }({\varvec{\beta }}^{(i)}|{\varvec{X}},{\varvec{y}},{\varvec{z}})]\}$ by its sample mean $\frac{1}{M}\sum _{m=1}^M\text{ tr }[{\varvec{A}}_i\cdot \text{ var }({\varvec{\beta }}^{(i)}|{\varvec{X}},{\varvec{y}},{\varvec{z}}_m)]$ in (20), and thus we estimate the exact objective function by

$$\begin{aligned} Q_{MC}=Q_0+\sum _{i=1}^2 \frac{1}{M}\sum _{m=1}^M\text{ tr }[{\varvec{A}}_i\cdot \text{ var }({\varvec{\beta }}^{(i)}|{\varvec{X}},{\varvec{y}},{\varvec{z}}_m)], \end{aligned}$$

where $Q_0$ only depends on ${\varvec{\eta }}$ but not ${\varvec{z}}$. We denote Q as the approximated objective value computed by dropping $E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]$ from (20), i.e.,

$$\begin{aligned} Q=Q_0+Q_1+Q_2+\Delta _1+\Delta _2, \end{aligned}$$

and none of these terms in Q depends on ${\varvec{z}}$

To set up a simulation study to compare Q and $Q_{MC}$, we take the same candidate points as the simulation in Sect. 4.1, but we use different parametric settings. For each run of the simulation, we sample $r_0,r_1,r_2$ uniformly from (0, 1), $\rho$ uniformly from (0, 10), and ${\varvec{\eta }}=\eta \cdot {\mathbf {1}}_7$ where $\eta$ is uniformly sampled from (0, 20). We fix $\sigma ^2$ to be 1 and the matrix ${\varvec{A}}$ is fixed as in Sect. 4.1 since they have only a scaling effect on the objective function. We repeat 1000 such runs, and for each run, we compute the relative error $100\%(\frac{Q-Q_{MC}}{Q_{MC}})$. Figure 6 shows the histogram of 1000 values of this relative error in percentage. We can see that most of its values are centered around 0 and more than $93\%$ of them are within the range $(-1\%,+1\%)$. Therefore, we conclude that Q provides a good approximation of $Q_{MC}$. Since their difference is a realization of $E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]$, this indicates that $o(||{\varvec{z}}-{\varvec{\pi }}||^2)$ is practically negligible on average.

To get a better understanding of how close Q and $Q_{MC}$ are, we fix $r_0=0.3$, $r_1=0.4$, $r_2=0.5$ and $\rho =5$. The values of $\sigma ^2$ and ${\varvec{A}}$ are set the same as above. We only vary $\eta$ as in ${\varvec{\eta }}=\eta \cdot {\mathbf {1}}_7$ from very small to relative large, since $\eta$ controls how close $V_i$ and $W_i$ are. The values of Q and $Q_{MC}$ computed based on different $\eta$ values are listed in Table 1, from which we can see that Q well approximates $Q_{MC}$ regardless of $\eta$.

Table 1 A few examples of Q and $Q_{MC}$ for different $\eta$ values

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, L., Huang, X. Bayesian A-Optimal Design of Experiment with Quantitative and Qualitative Responses. J Stat Theory Pract 13, 64 (2019). https://doi.org/10.1007/s42519-019-0063-6

Download citation

Published: 10 October 2019
DOI: https://doi.org/10.1007/s42519-019-0063-6

Bayesian A-Optimal Design of Experiment with Quantitative and Qualitative Responses

Abstract

Similar content being viewed by others

A Laplace-based algorithm for Bayesian adaptive design

Particle swarm based algorithms for finding locally and Bayesian D-optimal designs

A Review of Bayesian Optimal Experimental Design on Different Models

Explore related subjects

1 Introduction

2 A-Optimal Design Problem Under Bayesian QQ Model

2.1 A Review of Bayesian QQ Model

2.2 A Review of Parametric Bayesian Design Approach

2.3 The Objective Function for the Local Optimal Design

Theorem 1

3 Optimization Search

3.1 Particle Swarm Optimization

Algorithm 1

3.2 Combine with Blocked Coordinate Descent

Algorithm 2

3.3 Finding the Global Optimum on the Entire Domain of \({\varvec{\eta }}\)

Algorithm 3

Algorithm 4

4 Simulation Study

4.1 Local Optimal Design: an Example with Small Candidate Size

4.2 Local Optimal Design: an Example with Large Candidate Size

4.3 An Example for the Global Optimal Design

5 Summary and Discussion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

1.1 A Proof of Theorem 1

Lemma 1

Proof

B Practical Justification of the Negligibility of \(E[o(||{\varvec{z}}-{\varvec{\pi }}||^2)]\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation