1 Introduction

Bayesian optimization (BO) [1,2,3,4] is a popular surrogate-based data-efficient technique for optimizing complex and time-consuming optimization problems [5, 6]. It is particularly useful when data is limited or expensive to acquire. This paper presents a case study in which both cheap and expensive objective and constraint functions are considered in the design of electric motors.

In the case of electric motor design, many geometric and electromagnetic parameters can affect the motor’s performance. BO can help to efficiently identify the optimal combination of these parameters to achieve the desired performance indicators (such as high efficiency and high torque density, as required for use in electric vehicles).

Electric motor design optimization is highly relevant in practice, as electric motors consume about \(40\%\) of the generated energy worldwide [7]. Usually, the optimization is done using a Genetic Algorithm (GA) evaluated on Finite Element Methods (FEM), requiring large numbers of evaluations [8]. Such FEM evaluations can take hours to days, depending on the geometries of the motor under study; consequently, this design optimization problem could benefit substantially from an efficient optimization method [7, 9, 10]. Our approach uses a surrogate model to approximate the expensive FEM evaluations [8, 11, 12]. Objectives and constraints that can be calculated cheaply (i.e., without the need for FEM, such as the total mass of the material) do not require a surrogate model, though; they can be quantified or approximated using deterministic closed-form formulas. Our approach distinguishes between the cheap and expensive functions in the optimization procedure; we show that this yields substantial improvements in data efficiency compared to traditional BO methods.

BO has two core components: a surrogate model and an acquisition function. The surrogate model is used to approximate the expensive output functions (either objectives or constraints) cheaply. The choice for the surrogate model is commonly a Gaussian Process (GP) [13, 14] or any other statistical model with uncertainty quantification capability such as Polynomial Chaos Expansion [15, 16], Neural Networks [17, 18], or Tree Parzen Estimators [19]. Based on the model prediction and the uncertainty quantification, an acquisition function is defined to sequentially search for the optimum design by balancing exploration and exploitation. A lot of BO research is available, accounting for different complexities in the optimization setting, such as batch optimization [20, 21], multi-fidelity [22], constrained optimization [23, 24], and multiple objectives [25, 26]. A review paper discussing problem settings in BO is presented in [27].

In Multi-Objective Bayesian Global Optimization (MOBGO), cheap and expensive objectives are commonly treated in the same manner, i.e., modeled using surrogate models. Some attempts to exploit cheap-expensive properties are presented in recent literature: Allmendinger et al. [28] extend the genetic algorithm approach to deal with cheap objectives by using a fast-first and interleaving method. Wang et al. [29,30,31] study the relationship between cheap-expensive objectives and search bias in evolutionary algorithm settings. Loka et al. [32] propose a hypervolume-based BO approach considering a mix of cheap and expensive objectives, but only for an unconstrained bi-objective setting.

In this study, a two-stage constrained MOBGO algorithm is presented to optimize a Permanent Magnet Synchronous Motor (PMSM) design. The algorithm explicitly distinguishes between the cheap and expensive output functions. The first stage is a constrained active learning (AL) step used to improve the accuracy of the expensive constraint predictions. This is especially useful when these constraint functions are highly irregular (showing many local optima) and/or when the feasible region of the solution space is very small (implying that the initial design may contain only a few or even no feasible designs). The second stage is the optimization stage, which uses the proposed hypervolume-based cheap-expensive constrained acquisition function (CEHVI-C). This function extends the work of Yang et al. [33] by incorporating the cheap objectives directly in the hypervolume calculation. The cheap constraints are accounted for in the optimizer of the BO acquisition function. We show that the resulting algorithm can attain competitive solutions faster than the traditional BO method.

The key contributions of this paper are the following:

  • The proposed approach builds on a flexible way to quantify hypervolume, exploiting the distinction between cheap and expensive objectives. This improves the computational effort for calculating this metric. Moreover, contrary to the work in [33], it is applicable to any arbitrary box decomposition approach. Additionally, the algorithm handles expensive and cheap constraints in a clearly distinct way (accounting for the former in the probability of feasibility and for the latter in the optimization of the acquisition function). As shown, this results in an algorithm that is data-efficient and yields high-quality solutions.

  • Using the proposed approach, we show that the PMSM design problem can be solved in a data-efficient manner, which outperforms the common approaches used to solve this problem in the literature. This is in itself an insightful result, as the FEM calculations are very expensive.

The rest of this paper is organized as follows: Sect. 2 describes the PMSM under study. Section 3 explains the basics of multi-objective optimization in general, along with the corresponding notation and terminology. Section 4 presents the proposed algorithm for constrained multi-objective problems with cheap and expensive outcomes. In Sect.5, the experimental setup and results are discussed. Finally, Sect. 6 presents the conclusions of this paper.

2 Permanent magnet synchronous motor optimization

A permanent magnet synchronous motor (PMSM) has several advantages compared to other types of electric motors, such as higher-power density and higher efficiency. Consequently, this type of motor is preferred in settings where power density and efficiency are critical, such as in automotive applications. The downside is that this motor uses rare-earth magnets, which are not only very expensive but also unfriendly to the environment because of the recycling problems [34, 35] and the impact of the mining activities [36].

Figure 1 shows a schematic drawing of the motor with the relevant geometrical design parameters, which are further detailed in Table 1. The magnets (referred to as rotor poles, pm) are located on the rotor in red and black (to reflect different polarities).

Fig. 1
figure 1

Motor geometry with geometrical annotations

Table 1 Geometrical design parameters of the PMSM design optimization problem

The PMSM design problem has three objectives (see Table 2) and six constraints (see Table 3).

The optimization of these parameters is nontrivial, though, as they tend to have conflicting impacts on the objectives: increasing the axial length of the motor (parameter (8) in Fig. 1), for instance, increases the average torque but simultaneously increases the total mass (and hence, cost). Consequently, this design optimization problem is a constrained multi-objective optimization problem. Some of the performance indicators in Tables 2 and 3 are cheap, meaning that they can be calculated by means of closed-form formulas. The expensive indicators are evaluated using Finite Element Methods (FEM) [37]. More details on the performance indicators can be found in Appendix 1.

Table 2 Objectives for the optimization of a PMSM
Table 3 Constraints for the optimization of a PMSM

The PMSM problem is similar to the one proposed in [38, 39]. Some work has been done to perform multi-objective optimization on a similar motor design, but it relies on many FEM evaluations [39,40,41] and handcrafted optimization steps [39, 42].

This study focuses on developing a data-driven approach that also minimizes the number of expensive evaluations, and so that it can be applied to different problems and settings.

3 Constrained MOO: problem formulation

The goal of a constrained Multi-Objective Optimization (MOO) method is to optimize a set of objective functions \(\varvec{f}(\textbf{x}) = [f_1(\textbf{x}),f_2(\textbf{x}),\dots ,f_M(\mathbf {\textbf{x}})] \in \mathbb {R}^M\), while satisfying a set of constraints \(\varvec{g}(\textbf{x}) = [g_1(\textbf{x}), g_2(\textbf{x}), \dots , g_V(\textbf{x})]\le 0 \in \mathbb {R}^V\)

$$\begin{aligned}&\min _{\textbf{x}}\left( f_1(\textbf{x}), \ldots , f_M(\textbf{x})\right) \nonumber \\&\text {s.t. } g_v(\textbf{x}) \le 0, \quad v=1, \ldots , V \end{aligned}$$
(1)

where \(M\ge 2\) is the number of objectives, \(V \ge 1\) is the number of constraints, and \(\textbf{x} \in \mathcal {X} \subset \mathbb {R}^{d}\). The set \(\mathcal {X}\) is d-dimensional and bounded. Without loss of generality, this paper assumes that the objectives need to be minimized (except when explicitly stated otherwise). In MOO, there typically is no single optimal solution \(\textbf{x}^*\) that minimizes all objectives simultaneously while satisfying all constraints; instead, there is a set of optimal solutions, referred to as the Pareto set. Mathematically, the Pareto set for an unconstrained optimization problem is defined as:

$$\begin{aligned} \varvec{P}=\left\{ \textbf{x} \in \mathcal {X} \mid \not \exists \mathbf {x^{\prime }} \in \mathcal {X}: \mathbf {x^{\prime }} \prec \textbf{x}\right\} \end{aligned}$$
(2)

where the notation \(\textbf{x}_{b} \prec \textbf{x}_{a}\) means that \(\textbf{x}_{b}\) dominates \(\textbf{x}_{a}\). In a minimization problem with M objectives, \(\textbf{x}_{b} \prec \textbf{x}_{a}\) if and only if \(f_{m}\left( \textbf{x}_{b}\right) \le f_{m}\left( \textbf{x}_{a}\right) , \forall m \in \{1,.., M\}\) and \(\exists m \in \{1,.., M\}\) such that \(f_{m}\left( \textbf{x}_{b}\right) <f_{m}\left( \textbf{x}_{a}\right)\). Informally, we can say that \(\textbf{x}_{b}\) dominates \(\textbf{x}_{a}\) if and only if it is better in at least one objective, while not being worse in any of the other objectives. As evident from Eq. (2), \(\varvec{P}\) is defined in the input space; the image of the Pareto set in the objective space is referred to as the Pareto front: \(\mathcal {P}=\left\{ \varvec{f}(\textbf{x}) \in \mathbb {R}^{M} \mid \not \exists \mathbf {x^{\prime }} \in \mathcal {X}: \mathbf {x^{\prime }} \prec \textbf{x}\right\}\). In constrained problems, only feasible points \(\textbf{x}\) can be part of the Pareto set. We thus define Pareto feasible set as:

$$\begin{aligned} \varvec{P}_{feas}=\left\{ \textbf{x} \in \mathcal {X} \mid \not \exists \mathbf {x^{\prime }} \in \mathcal {X}: \mathbf {x^{\prime }} \prec \textbf{x}, \ \varvec{g}(\textbf{x}) \le 0, \ \varvec{g}(\mathbf {x^{\prime }}) \le 0 \right\} \end{aligned}$$
(3)

For ease of notation, we denote \(\varvec{P}:= \varvec{P}_{feas}\) and \(\mathcal {P}:= \mathcal {P}_{feas}\) for every constrained problem in this paper.

In this work, Bayesian Optimization (BO) is used to find the Pareto set in a data-efficient manner (i.e., using the smallest possible number of function evaluations). Bayesian optimization has two main components: (1) the surrogate model, which approximates the expensive output functions, and (2) the acquisition function, which guides the BO procedure by sequentially selecting additional input points to evaluate. BO automatically balances exploration and exploitation [13, 14, 43]. The Gaussian Process (GP) is the most popular type of surrogate model used in BO; the technical details of the model can be found in Appendix 2. The proposed acquisition function is a key component and is discussed in the following section.

4 MOBGO algorithm for cheap-expensive objectives and constraints

Previous research on MOBGO algorithms commonly uses an acquisition function based on the hypervolume metric to search for the Pareto optimal points [25, 26, 33, 44]. Very recently, this type of acquisition function has been applied in mixed-variable settings [45], parallel evaluation settings [46], and for high-dimensional problems [47]. Yet, none of these previous works exploit potential differences in the latencies (i.e., the evaluation times) of the different objective functions. In real-life problems, it often occurs that the output functions (objectives and/or constraints) are a mix of cheap and expensive functions.

To the best of our knowledge, the only papers available so far on this topic are [28, 48] (which focus on exploiting latency differences in evolutionary algorithms), and [32] (which presents a BO algorithm limited to bi-objective unconstrained MOO problems). Recently, Buckingham et al. [49] proposed a scalarization-based multi-objective BO approach for a similar problem. However, their method assumes that the cheap objective does not have a closed-form formula.

We propose a two-stage optimization approach (as in [50]), which is depicted in Fig. 2. The first stage is optional and is referred to as the Active Learning (AL) stage. It aims to improve the accuracy of the GPs for hard-to-model constraints (if any), using the Feasible Predictive Variance acquisition function discussed in Sect. 4.1. In the AL phase, the initial surrogates for these constraints are estimated based on a set of initial design points, which are evaluated using the expensive FEM models. The most common choice in the BO literature is a maximin Latin Hypercube design, [51] to ensure that the resulting set is space-filling. As the aim of the AL phase is to improve the accuracy of these constraint models, additional points are queried using the Feasible Predictive Variance acquisition function, which is discussed in detail in Sect. 4.1. The AL stage ends when the specified AL budget is depleted and there is at least one feasible point present in the observations. In some cases, the feasible area of the problem is very small, which may force the analyst to keep querying points until both conditions are fulfilled.

The second stage is the Bayesian optimization stage. If it was preceded by the AL stage, the resulting observations are used as starting points, on which the surrogate models are again estimated. If the AL phase was skipped, the starting points are generated through a space-filling design (usually a maximin Latin Hypercube design, for the reasons stated above), and they are first evaluated with the expensive FEM models to estimate the initial surrogate models. The additional points to evaluate are then selected using the proposed cheap-expensive hypervolume-based expected improvement acquisition function, which is discussed in Sect. 4.2. The algorithm ends when the total budget has been depleted, and the points that have been evaluated as feasible and Pareto-optimal are put forward as the points on the front.

As explained below, the acquisition functions of both stages account for the estimated Probability of Feasibility (PF) of the point with respect to the expensive constraints, to avoid spending the budget for evaluating points that are likely infeasible. The feasibility with respect to the cheap constraints is handled inside the optimization procedure that maximizes the acquisition functions, as discussed in Sect. 4.2.2.

Fig. 2
figure 2

Two-stage optimization scheme. The first stage (AL) is optional. We use it in our experiments for the PMSM case. The data set resulting from the active learning steps in a are then used as the starting points in b. If the AL phase is skipped, the starting points in b are generated by a Latin Hypercube design

4.1 Stage 1: active learning (AL)

This stage aims to improve the accuracy of the GP models of the expensive constraints by focusing solely on exploring the region where the model exhibits high uncertainty. This is especially useful when the expensive constraint functions are non-smooth (i.e., highly irregular, showing many local optima, which makes the function hard to model) and/or when the feasible region of the solution space is very small (implying that the initial design may contain only a few or even no feasible designs). In such cases, the information gained during the AL stage results in significant efficiency gains in the optimization stage.

The acquisition function used is the Feasible Predictive Variance (FPV), which is defined as:

$$\begin{aligned} {FPV} = \prod _{v=1}^{V} \sigma _{v}^2 \cdot PF_v({\textbf{x}_*}), \end{aligned}$$
(4)

where \(\sigma _{v}^2\) is the predictive variance of the hard-to-model constraint v at \(x_*\), and \(PF_{v}\) refers to the Probability of Feasibility (see e.g. [23]) of \(x_*\) for constraint v:

$$\begin{aligned} PF_v({\textbf{x}_*})&:={\text {Pr}}[\tilde{g_v}(\textbf{x}_*) \le 0] \end{aligned}$$
(5)
$$\begin{aligned}&=\int _{-\infty }^{0} p\left( \tilde{g_v}({\textbf{x}_*}) \mid {\textbf{x}_*}, \mathcal {T}_{N}\right) d \tilde{g_v}({\textbf{x}_*}), \end{aligned}$$
(6)

where \(\tilde{g_v}({\textbf{x}_*})\) refers to the Gaussian process outcome for constraint v at \({\textbf{x}_*}\), and \(\mathcal {T}_{N}\) denotes the data set already available for constraint v.

In the PMSM case study, we use this stage specifically for the average torque constraint (see Table 3) as this constraint is hard to model, and, moreover, it restricts the number of feasible solutions more severely than the other constraints.

4.2 Stage 2: Bayesian optimization

After finding enough feasible solutions, new points are selected by maximizing the Cheap-Expensive Expected Hypervolume Improvement with Constraints (CEHVI-C) acquisition function:

$$\begin{aligned} {\text {CEHVI-C}} = {\text {CEHVI}}(\varvec{\mu }, \varvec{\sigma }, \mathcal {P}, \textbf{r}) \cdot \prod _{v=1}^{V} PF_v({\textbf{x}_*}) \end{aligned}$$
(7)

CEHVI-C multiplies the proposed CEHVI acquisition function (see Sect. 4.2.1) with the probability of feasibility of all expensive constraints \(g_v\) ( \(v=1\cdots V\)). Assuming that all these constraints are independent, this reduces to a multiplication of the individual \(PF_v({\textbf{x}_*})\). Note that equation 7 also implicitly assumes conditional independence between the objective and constraint functions [23]. Both assumptions are standard in constrained Bayesian optimization.

The cheap constraints are not directly incorporated in Eq. 7, since this would likely introduce severe non-smooth behavior in the response surface of the acquisition function. Instead, the cheap constraints are accounted for in the optimization procedure implemented to maximize the acquisition function, which is further detailed in Sect. 4.2.2.

4.2.1 Cheap-expensive expected hypervolume improvement

We extend the EHVI formulation presented in [33] such that (1) it can efficiently handle a mix of cheap and expensive objectives, and (2) it can be used independent of the hyperbox decomposition method chosen to implement the calculations.

In hypervolume-based MOBGO, the notion of improvement by the Lebesgue measure is used. Let us first define the hypervolume indicator (HVI) \(\mathcal {H}\) [52, 53]. Given a Pareto front \(\mathcal {P}\), the hypervolume indicator \(\mathcal {H}\) of this front \(\mathcal {P}\) w.r.t. a reference point \(\textbf{r}\) is defined as follows [52, 53]:

$$\begin{aligned} \mathcal {H}(\mathcal {P},\textbf{r})=\lambda _M\left( \cup _{\textbf{y} \in \mathcal {P}}[\textbf{r}, \textbf{y}]\right) , \end{aligned}$$
(8)

where \(\lambda _M\) is the Lebesgue measure of the region that dominates \(\textbf{r}\) and that is dominated by \(\mathcal {P}\) (in \(\mathbb {R}^M\), i.e., for \(\mathbb {R}^2\), \(\lambda _2\) is the area of the dominated region, while on \(\mathbb {R}^3\), \(\lambda _3\) is the volume).

Using this definition, we can define the hypervolume improvement (also referred to as exclusive hypervolume) generated by a new point \(\textbf{y}_*\) as:

$$\begin{aligned} \mathcal {H}_{exc}(\textbf{y}_*, \mathcal {P}, \textbf{r})=\mathcal {H}(\mathcal {P} \cup \{\textbf{y}_*\}, \textbf{r})-\mathcal {H}(\mathcal {P},\textbf{r}). \end{aligned}$$
(9)

Based on the definition of hypervolume improvement in Eq. 9, we can define the Expected HyperVolume Improvement (EHVI) at an arbitrary new design point \(\mathbf {x_{\star }}\) as:

$$\begin{aligned} {\text {EHVI}}(\varvec{\mu }, \varvec{\sigma }, \mathcal {P}, \textbf{r})&=\int _{\mathbb {R}^M} \mathcal {H}_{exc}(\textbf{y}, \mathcal {P}, \textbf{r}) \cdot \varvec{\xi }_{\sigma , \mu }(\textbf{y}) d \textbf{y}, \end{aligned}$$
(10)

where \(\textbf{y}\) corresponds to a (random) M-variate objective vector, while \(\varvec{\xi }_{\sigma , \mu }(\textbf{y})\) denotes the value of the M-variate independent normal density function in this vector (given the predictive mean vector \(\varvec{\mu }\) \(\in \mathbb {R}^M\) at \(\mathbf {x_{\star }}\), and the predictive variance vector \(\varvec{\sigma ^2}\) \(\in \mathbb {R}^M_+\) at \(\mathbf {x_{\star }}\)). For ease of notation, let \({\text {EHVI}}(\varvec{\mu }, \varvec{\sigma }, \mathcal {P}, \textbf{r}):={\text {EHVI}}(\varvec{\mu }, \varvec{\sigma })\).

Let us define a set \(\Delta (\textbf{y}, \mathcal {P}, \textbf{r})\) which contains (given a Pareto front \(\mathcal {P}\), the reference point \(\textbf{r}\) and the output vector \(\textbf{y}\)) all the output vectors that currently do not belong to the dominated set, but that would be added to it when the vector \(\textbf{y}\) were added to the front [33, 54]:

$$\begin{aligned} \Delta (\textbf{y}, \mathcal {P}, \textbf{r})&=\left\{ \textbf{z} \in \mathbb {R}^M \mid \textbf{y} \prec \textbf{z} \right. \text{, } \textbf{z} \prec \textbf{r} \left. \text{ and } \not \exists \textbf{q} \in \mathcal {P}: \textbf{q} \prec \textbf{z}\right\} , \end{aligned}$$
(11)

For notational simplicity, let \(\Delta (\textbf{y}, \mathcal {P}, \textbf{r}):= \Delta (\textbf{y})\). The EHVI in Eq. 10 can then be rewritten as:

$$\begin{aligned} {\text {EHVI}}(\varvec{\mu }, \varvec{\sigma })&=\sum _{i=1}^{N_M}\left( \int _{-\infty }^{y_1=u_1^{(i)}} \cdots \int _{-\infty }^{y_M=u_M^{(i)}} \right) \lambda _M\left[ S^{(i)}_M \cap \Delta (\textbf{y})\right] \cdot \xi _{\mu , \sigma }(\textbf{y}) d \textbf{y}, \end{aligned}$$
(12)

where \(\lambda _M\) refers to the M-dimensional Lebesgue measure, \(S^{(i)}_M\) refers to hyperbox i (see Eq. B20 in Appendix 2), and \(N_M\) denotes the total number of hyperboxes in the decomposition. Note that Eq. 12 allows for piece-wise integration, given the summation over the different hyperboxes.

Dividing each integration slice \(\int _{-\infty }^{y_m=u_m^{(i)}}\) into \((\int _{-\infty }^{y_m=l_m^{(i)}}+\int _{l_m^{(i)}}^{y_m=u_m^{(i)}})\), we obtain:

$$\begin{aligned} {\text {EHVI}}(\varvec{\mu }, \varvec{\sigma })&= \sum _{i=1}^{N_M}\left( \left( \int _{- \infty }^{y_1=l_1^{(i)}}+ \int _{y_1=l_1^{(i)}}^{y_1=u_1^{(i)}}\right) \right. \cdots \left. \left( \int _{- \infty }^{y_{M}=l_{M}^{(i)}}+ \int _{y_{M}=l_{M}^{(i)}}^{y_{M}=u_{M}^{(i)}}\right) \right) \nonumber \\&\qquad \lambda _M\left[ S_M^{(i)} \cap \Delta \left( \textbf{y}\right) \right] \cdot \varvec{\xi }_{\mu , \sigma }(\textbf{y}) d \textbf{y}, \end{aligned}$$
(13)

as evident, each of the individual terms of this sum consist of the multiplication of M factors, each of which contains the sum of 2 integrals. Since integration is a linear mapping, we can expand each individual term in Eq. 13, resulting in a summation of \(2^M\) terms, each consisting of an M-dimensional integral.

Let us finally define \(\textbf{C}^{(j)_2}\) as a binary representation of such an M-dimensional integral. \(\textbf{C}^{(j)_2}\)’s length is thus equal to M. The kth element, \(C_k^{(j)_2}\), equals 0 if the kth integral has finite bounds, and 1 if the lower bound is \(-\infty\).

Using the results from [26, 33], the EHVI can then be calculated exactly as follows:

$$\begin{aligned} {\text {EHVI}}(\varvec{\mu }, \varvec{\sigma }) =\sum _{i=1}^{N_{M}}\bigg (\sum _{j=0}^{2^{M}}\Big (\prod _{k=1}^{M}\omega _e(i, k, C_{k}^{(j)_{2}}) \Big ) \bigg ), \end{aligned}$$
(14)

with

$$\begin{aligned} \omega _e\left( i, k, C_{k}^{(j)_{2}}\right) := {\left\{ \begin{array}{ll}\Psi \left( l_{k}^{(i)}, u_{k}^{(i)},\mu _{k}, \sigma _{k}\right) &{} \text{ if } C_{k}^{(j)_{2}}=0 \\ \vartheta \left( l_{k}^{(i)}, u_{k}^{(i)}, \sigma _{k}, \mu _{k}\right) &{} \text{ if } C_{k}^{(j)_{2}}=1,\end{array}\right. } \end{aligned}$$
(15)

where i refers to the index number of the hyperbox, k to the index number of the objective function, and \(C_k^{(j)_2}\) is the binary representation of the kth objective. Note that Eq. 14 implicitly uses the independence assumption between the different objectives, to replace the M-dimensional integrals by multiplication of M single-dimensional integrals. As evident from Eq. 15, the calculation of these single integrals can be done exactly, but depends on whether the integral has finite bounds (\(C_{k}^{(j)_{2}}=0\)) or an infinite lower bound (\(C_{k}^{(j)_{2}}=1\)). More specifically, for \(C_{k}^{(j)_{2}}=1\), we have:

$$\begin{aligned} \vartheta \left( l_{k}^{(i)}, u_{k}^{(i)}, \sigma _{k}, \mu _{k}\right)&=\left( u_{k}^{(i)}-l_{k}^{(i)}\right) \cdot \left( \Phi \left( \frac{l_{k}^{(i)}-\mu _{k}}{\sigma _{k}}\right) \right) , \end{aligned}$$
(16)

where \(\Phi\) denotes the Cumulative Distribution Function (CDF) of the standard normal distribution. Equation 16 also occurs in [33], but we adjust it here for a minimization context.

For \(C_{k}^{(j)_{2}}=0\), we have:

$$\begin{aligned} \Psi \left( l_{k}^{(i)}, u_{k}^{(i)},\mu _{k}, \sigma _{k}\right)&= \Psi _{-\infty }\left( u_{k}^{(i)}, u_{k}^{(i)}, \mu _{k}, \sigma _{k}\right) -\nonumber \\ {}&\Psi _{-\infty }\left( u_{k}^{(i)}, l_{k}^{(i)}, \mu _{k}, \sigma _{k}\right) , \end{aligned}$$
(17)

with

$$\begin{aligned} \Psi _{-\infty }(a, b, \mu , \sigma ):&=\int _{-\infty }^{b}(a-z) \frac{1}{\sigma } \phi \left( \frac{z-\mu }{\sigma }\right) d z \end{aligned}$$
(18)
$$\begin{aligned}&\quad\quad=\sigma \phi \left( \frac{b-\mu }{\sigma }\right) +(a-\mu ) \left[ \Phi \left( \frac{b-\mu }{\sigma }\right) \right] , \end{aligned}$$
(19)

where \(\phi\) and \(\Phi\) denote the Probability Density Function (PDF) and Cumulative Distribution Function (CDF) of the standard normal distribution, respectively.

We can further refine Eq. 14 to deal efficiently with cheap and expensive objectives. To that end, we introduce an M-dimensional binary vector \(\textbf{t}^f\): the kth element of this vector, \(t_k^f\), equals 0 if the kth objective is cheap, and 1 otherwise.We can then efficiently calculate the resulting Cheap-Expensive Hypervolume Improvement (CEHVI) as follows:

$$\begin{aligned} {\text {CEHVI}}(\varvec{\mu }, \varvec{\sigma }) =\sum _{i=1}^{N_{M}}\bigg (\sum _{j=0}^{2^{M}}\Big (\prod _{k=1}^{M}\omega (i, k, C_{k}^{(j)_{2}},t_k^f) \Big ) \bigg ), \end{aligned}$$
(20)

with

$$\begin{aligned} \omega (i, k, C_{k}^{(j)_{2}},t_k^f) := {\left\{ \begin{array}{ll} \omega _c(i, k, C_{k}^{(j)_{2}}) &{} \text { if } t^f_k = 0 \\ \omega _e(i, k, C_{k}^{(j)_{2}}) &{} \text { if } t^f_k = 1. \end{array}\right. } \end{aligned}$$
(21)

For the expensive objectives, this expression reduces to \(\omega _e\) as given by Eq. 15. For the cheap objectives, the calculation of \(\omega _c(i, k, C_{k}^{(j)_{2}})\) depends on the relative location of \(y_k\) w.r.t. \(l_k^{(i)}\) and \(u_k^{(i)}\), as shown in Table 4. As evident, the resulting values are deterministic.

Table 4 Calculation of \(\omega _c(i, k, C_{k}^{(j)_{2}})\) for the cheap objectives

4.2.2 Acquisition function maximization

In every iteration, the query points are obtained by maximizing the acquisition function (usually referred to as inner optimization). We use a multi-start optimization method [55, 56] that incorporates the cheap constraints. First, the Monte Carlo method is used to sample 5000 points, and the points that violate cheap constraints are removed. We then select the 10 points with the highest CEHVI-C value and apply Sequential Least Square Programming (SLSQP) with the cheap constraints [57] to each of these in parallel (using the SLSQP implementation of the Scipy [58] library).

5 Result and discussion

5.1 Experiment settings

The proposed hypervolume-based MOBGO algorithm was implemented using Trieste [59] in Python. Before applying our method in the Motor Optimization case, we consider five benchmark functions to test the performance of the proposed algorithm, by testing it on three unconstrained optimization problems (DTLZ1, DTLZ2, and DTLZ3 [60]) and two constrained optimization problems (BNH [61] and SRN [62]). The characteristics of these benchmark functions are presented in Table 5. The reference point indicated in the table is used for the hypervolume computations.

Table 5 Full specification of the benchmark functions

For these benchmark functions, \((d \times 11 + 1)\) initial design points were generated using quasi-random Halton Sampling [63]. The proposed method is compared with EHVI(-Constrained), Random sampling, and NSGA-II (see [64]; we used the version present in PyMOO [65], which accounts for constraints). The total budget for the FEM simulator is set to 100 input evaluations, except for the NSGA-II algorithm: as this method is less data efficient, we allow it to spend 250 input evaluations. The AL budget is set to zero, as the expensive constraints (if any) are not hard to model. Evidently, for the unconstrained problems, the CEHVI-C acquisition function reduces to the CEHVI acquisition function (see Eq. 7).

For the PMSM optimization problem, 35 initial points are generated using Latin Hypercube sampling [51]. The AL budget for the BO methods is set to 10 iterations; we include the AL phase in this problem as the initial design is small, so we expect it to be beneficial, especially for learning the hard-to-model average torque (Tavg) constraint. Both the initial design and the number of AL iterations are deliberately kept small as the FEM model is relatively slow to run. We compare the performance of our algorithm against the same competitors as in the benchmark functions. The total budget equals 100 input evaluations, except for the NSGA-II algorithm (250 evaluations).

5.2 Results for benchmark functions

We carried out 10 repetitions for each of the benchmark experiments, to check the robustness of the results against the randomness involved in the algorithms (which is evident in the NSGA-II and Random algorithms; in the BO algorithms, it impacts the multistart design of the inner optimization).

Fig. 3
figure 3

Evolution of the mean hypervolume on the benchmark problems, with a total budget of 100 evaluations. The shaded areas reflect the 95% confidence intervals based on 10 repetitions

Figure 3 shows the evolution of the mean hypervolume on the different benchmark functions, for the competing algorithms. As shown, the BO approaches (CEHVI and EHVI) clearly outperform the competing algorithms in the unconstrained problem settings (top row). Moreover, the CEHVI algorithm has significantly better performance than the EHVI algorithm in the DTLZ1 and DTLZ3 experiments, which have a disjoint Pareto front [60] and are thus hard to optimize (the DTLZ2, by contrast, has a smooth Pareto front).

In the constrained benchmark problems (bottom row), CEHVI-C again has a clearly higher hypervolume indicator value than the other methods. While both problems have a smooth Pareto front, CEHVI-C outperforms EHVI-C in the SRN problem (in the BNH problem, the performance of both algorithms is similar, as the cheap function is smoother and thus relatively easy to model with GPs).

Table 6 gives an overview of the final hypervolume obtained at the end of the different algorithms, along with the difference (in \(\%\)) from the true optimal hypervolume. As evident from this table, CEHVI-C is the winner in all test problems except in BNH. Note, though, that NSGA-II only succeeds in outperforming both BO approaches here because we gave it a significantly higher total budget; for limited budgets (\(<=100\)), NSGA-II is clearly inferior (as evident from Fig. 3). To assess the performance on varying input and output dimensions, extra experiments are conducted and presented in Appendix 2.

Table 6 Overview of the mean hypervolume and the \(\%\) difference from the true optimal hypervolume (with \(95\%\) confidence interval), obtained at the end of the algorithms, for the benchmark problems

5.3 Results for PMSM design problem

To check the robustness of the algorithms against randomness, we ran 10 repetitions. The hypervolume values are calculated with reference point \([\text {Efficiency}, \text {Torque ripple}, \text {Total mass}]=[0.80, 8.1, 23.]\).

Fig. 4
figure 4

Evolution of the mean hypervolume for the PMSM design problem, for a total budget of 100 iterations. The shaded areas reflect the \(95\%\) confidence interval based on 10 repetitions

Figure 4 shows the evolution of the mean hypervolume for the different algorithms. For the BO algorithms, we used the first ten iterations to implement an AL stage: here, points were queried by implementing the FPV acquisition function on the average torque constraint, to improve the corresponding GP model. As evident from the figure, the AL phase already succeeds in improving the hypervolume. In the optimization stage, we see that CEHVI-C performs better than EHVI-C; NSGA-II is clearly inferior to both BO approaches. Actually, it even fails to query feasible points at certain iterations (even the later ones) due to the many constraints in the PMSM design problem. As a result, its hypervolume only improves marginally in the first 100 iterations. As evident from Table 7, which shows the expected hypervolume obtained at the end of the algorithms, the improvements obtained remain marginal even at a higher budget.

Table 7 Overview of the mean hypervolume (with \(95\%\) confidence interval halfwidth), obtained at the end of the algorithms, for the PMSM design problem
Fig. 5
figure 5

Log of Pareto front of a single experiment run

Figure 5 illustrates the quality of the final Pareto front obtained by the CEHVI-C and the EHVI-C methods for a single-arbitrary run. Clearly, CEHVI-C succeeds in achieving solutions with a lower Torque ripple and a lower Total mass than EHVI-C without compromising motor efficiency. The CEHVI-C runs also take less computation time, as it avoids any estimations for the cheap objective and constraints.

The three objectives of the optimization are the torque ripple, the motor efficiency, and the total mass. The magnet mass is also added because it is the most expensive part of the machine. Generally speaking, the cost of 1 kg of rare-earth magnets equals more than ten times that of 1 kg of copper or 1 kg of iron [66, 67].

Fig. 6
figure 6

One of the Pareto-optimal designs found by CEHVI-C

Figure 6 shows the optimal geometry of an (arbitrary) Pareto-optimal PMSM design obtained by CEHVI-C, along with the flux lines and flux densities. While there is saturation in some parts of the core, it does not impact the performance metrics in any negative way.

6 Conclusion

In this paper, a hypervolume-based MOBGO approach has been presented and applied in view of optimizing a Permanent Magnet Synchronous Motor design. This design problem consists of a mix of expensive performance metrics (which require FEM evaluations) and cheap performance metrics (which can be evaluated using closed-form expressions). The key strength of the proposed approach is that it distinguishes between these cheap and expensive functions, by only estimating Gaussian Process models for the expensive outcomes. It includes an active learning stage (which uses the FPV acquisition function to improve the accuracy for hard-to-model constraints) and an optimization phase (which uses the proposed CEHVI-C acquisition function, which is a constrained and cheap–expensive version of the well-known EHVI criterion). The performance of the CEHVI-C function was first evaluated on a number of benchmark functions; as shown, it leads to superior performance over the standard EHVI-based approaches, especially when the cheap objective(s) are hard to model with GP. This superiority was further confirmed in the PMSM design results.

The proposed approach is likely beneficial for other engineering design problems that include cheap and expensive outcomes. In future research, we plan to extend the approach further such that it can handle noisy function evaluations. Another interesting topic is to extend the method to be cost-aware. Indeed, the cost of an expensive function evaluation may not be the same over the entire search space; cost-aware BO may then try to find the optimal solutions while minimizing both the number of function evaluations and the resulting evaluation cost.