Abstract
The hypervolume indicator has been proved as an outstanding metric for the distribution of Pareto points, and the derived hypervolume based expected improvement (HVEI) has received a particular attention in the multi-objective efficient global optimization (EGO) method. However, the high computational cost has become the bottle neck which limits the application of HVEI on many objective optimization. Aiming at this problem, a modified version of HVEI (MHVEI) is proposed in this paper, which is easier to implement, maintains all the desired properties, and has a much lower computational cost. The theoretical study shows that the new criterion can be considered as a weighted integral form of HVEI, and it prefers the new point with a higher uncertainty compared with HVEI. The numerical tests show that the MHVEI performs similar as HVEI on the lower dimensional problem, and the advantage of MHVEI becomes more obvious as the dimension grows.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Modern engineering optimization problems usually adopt time-consuming numerical simulations to evaluate the objectives or constraints. For example, a post-buckling analysis of complex aerospace shells usually needs more than two hours (Hao et al. 2012; Hao et al. 2017). Therefore it is desired to limit the number of function evaluations. This requirement has popularized surrogate-based optimization method (Keshtegar et al. 2017; Hao et al. 2015), which uses the cheap mathematical approximation to replace the expensive true objective in the design process. In this area, the Gaussian process models, such as the Kriging surrogate (Krige 1951) and the statistical radial basis functions (Sóbester et al. 2005), have received a great attention, since they not only capture the unknown functions but also provide the probability estimation. With this merit, the efficient global optimization (EGO) methods adopt the statistical infill criteria to select the optima candidate, such as expected improvement (EI) (Jones et al. 1998; Jones 2001) and generalized EI (GEI) (Schonlau 1998; Sasena et al. 2002), which are able to avoid missing the global optimum to a large extent. Therefore, the EGO methods have been widely used in the modern engineering.
The real-world design problem usually involves more than one sub-objectives which conflict with on another, and this kind of problem is usually mentioned as the multi-objective optimization (MOO) problem. For the case that there is a lack of knowledge about design requirement, it is desired to present the designer multiple diversiform and competitive Pareto designs (also mentioned as Pareto solutions, Pareto frontier, and Pareto points) to decide the most appropriate one. The Pareto designs are defined as the points for which no other point is better in all sub-objectives. The optimization aiming at obtaining the Pareto designs is called Pareto optimization here. In earlier times, there are two main kinds of Pareto optimization approaches.
The first kind of Pareto optimization approach is mentioned as the single objective based approach in this paper, which transforms the MOO problem into different single objective ones and each of them leads to a Pareto point. For example, the weighted sum method (Messac et al. 2000) and the physical programing method (Messac et al. 2004) balance the multiple sub-objectives into one single objective function in a linear or non-linear way. The normal boundary intersection method (Das and Dennis 2006) and normal constraint method (Messac 2003) utilize a group of evenly distributed points on the utopia plane (formed by the optimal solutions of all sub-objectives) to produce different equality or inequality constraints in single objective problems.
The other kind of Pareto optimization approach is the heuristic intelligence algorithms, such as the genetic algorithm (GA) and particle swarming optimization (PSO), etc. The heuristic algorithms have received a great attention in this area, since they are naturally suitable for obtaining Pareto solutions. Some successful algorithms, such as the non-dominated sorting GA (Deb et al. 2002) and strength Pareto evolutionary algorithm (Kim et al. 2004), have been widely used. However, this kind of approaches typically require a massive amount of objective evaluations. Therefore, some researchers used the surrogate model to substitute a part of real simulations to save computational cost. For example, Goel et al. (2007) performed the multi-objective GA on the response surfaces of all the sub-objectives to obtain approximated Pareto frontier. Li et al. (2006) adopted Kriging model to replace a part of true objectives in the GA, and Kriging predicted error was employed to decide whether the prediction or the actual simulation should be performed to evaluate the design points. Jeong and Obayashi (2008) calculated the EI functions as the fitness values in the GA to find non-dominated solutions, and part of them were evaluated to update the Kriging model. Jie et al. (2016) developed the Kriging assisted PSO approach, in which the Kriging prediction took place of true objective and a normalized summation of EI values was used to determine the most promising designs to be evaluated by actual simulations.
In last two decades, the EGO method has also been extended for the Pareto optimization, and the study mainly focused on the statistical infill criteria for obtaining Pareto solutions. The ParEGO method was developed by Knowles (2006) whose basic idea is using the augmented Tchebycheff functions with different random vectors to balance the multiple sub-objectives into different single objectives. For each single objective, the Kriging surrogate is established, and the EI function is used to obtain new design point. Similar strategy has been adopted in a heuristic algorithm named MOEA/D by Liu et al. (2007) and Zhang et al. (2010).They established the Kriging surrogate for each sub-objective, and further derived the EI function of the balanced single objective based on the multivariate distribution. The balanced single objective could be a weighted sum function (Liu et al. 2007) or the augmented Tchebycheff function (Zhang et al. 2010). The Euclidean distance based EI (EI-ELU) was developed by Keane (2006), in which the expected central point of the probability distribution (provided by Kriging) in the non-dominated region is calculated. Then, the minimum distance between this central point and the obtain Pareto points is maximized. Zhan et al. (2017) defined the EI values rather than all the obtained Pareto points as the EI-Matrix. Then, based on this matrix, three infill criteria which are based on the summation distance, maximum distance and hypervolume were developed respectively.
One kind of statistical infill criterion is developed based on the hypervolume indicator which was firstly proposed by Zitzler and Thiele (1998) and has been proved as an outstanding metric for the quality of Pareto points distribution (Knowles et al. 2003; Emmerich et al. 2005). This indicator is defined as the volume of the dominated region of the Pareto points and bounded by a reference point. With this definition, the statistical infill criterion hypervolume based EI (HVEI) (Emmerich et al. 2011) selects the point with the maximum expected value of hypervolume growth as the new point. Wagner et al. (2010) compared several Pareto optimization infill criteria (before 2010) on a list of necessary and desired properties, and it is found that the HVEI has the best performance in terms of most properties but with a high computational cost. The computational complexity of HVEI mainly comes from the irregular geometry of the integration (non-dominated) region, and it is approximated with Monte Carlo method in earlier times. Emmerich et al. (2011) described a formula for calculating HVEI, in which the integration region is divided into a number of cells, and the integration in HVEI is performed in each one of them. However, the number of cells increases exponentially with the number of objectives. Couckuyt et al. (2014) developed a fast algorithm for HVEI, which allows a much lower number of integration cells, but no computational complexity was provided. As a rough approximation of HVEI, the lower confidence bound (LCB) (Emmerich et al. 2006) is defined as the hypervolume improvement of a lower confidence point. Since there is no need for the integral, the LCB has a much lower computational cost than HVEI. However, even the runtime of simply calculating the hypervolume still grows exponentially with the number of objectives (Beumea and Emmerich 2007). Zhan et al. (2017) estimated that the infill criterion function could be called for two millions times in the whole optimization process. If one time evaluation of the infill criterion costs one second, the total evaluating time will be twenty three days which might be even longer than the true simulations. Therefore, although the HVEI shows great potential abilities on generating well distributed Pareto points, the high computational complexity limits its application on many objective problems.
In this paper, a modified HVEI (MHVEI) function is developed, which can be considered as a weighted integral form of HVEI. Compared with the original HVEI, this new statistical criterion is easier to implement, maintains all the desired properties, and has a much lower computational cost. This paper is organized as follows. In Section 2, some basic knowledge about the EGO method and HVEI are introduced. In Section 3, the MHVEI is developed and studied theoretically by comparing it with HVEI on a list of desired properties. In Section 4, the performance of MHVEI is investigated with some mathematical test functions.
2 Related works
In this section, the classic EGO method for the single objective problem and the original HVEI function will be introduced.
2.1 Efficient global optimization
2.1.1 Ordinary kriging (OK) surrogate
The OK surrogate is a widely used Gaussian process model. Considering the objective function y(x) is meant to be minimized where \( \mathbf{x}\mathbf{\in}\mathbb{X} \) denotes the design variables and \( \mathbb{X}\subset {\mathbb{R}}^d \) denotes the design variable space. The OK surrogate assumes that the unknown y(x) meets the normal distribution for any \( \mathbf{x}\mathbf{\in}\mathbb{X} \)
where β and σ2 are the assumed values of mean and variance. For any two points \( {\mathbf{x}}_i\mathbf{\in}\mathbb{X} \) and \( {\mathbf{x}}_j\mathbf{\in}\mathbb{X} \),
where r denotes a given correlation model function, and θ = {θi| 1 ≤ i ≤ d} denotes the correlation parameters on different dimensions. In (2) the Gaussian correlation model is adopted.
The OK surrogate is established based on a group of evaluated design variables \( \mathbf{S}=\left\{{\mathbf{S}}_i|{\mathbf{S}}_i\mathbf{\in}\mathbb{X},1\le i\le N\right\} \) with their objective values of Y = y(S). In this paper, S and Y will be mentioned as the training data of OK surrogate, and N denotes the number of training data. If the (1) and (2) are considered as the priori hypotheses, and Y = y(S) is taken as the known event, the conditional probability distribution of y(x) for any \( \mathbf{x}\mathbf{\in}\mathbb{X} \) will be derived based on the Bayes theory that
where \( \widehat{y}\left(\mathbf{x}\right) \) and \( \widehat{\sigma}{\left(\mathbf{x}\right)}^2 \) are the predicted mean and variance functions of OK surrogate separately that
where r(x) = {r(x, S1, θ), r(x, S2, θ), ⋯r(x, Sn, θ)}T denotes the correlation vector between x and S, 1 denotes the vector with n components of 1, and R=[r(Si, Sj, θ)]i, j denotes correlation matrix of S. Based on (3), the probability distribution function (PDF) provided by OK surrogate can be written as
where ϕ denotes the probability density function of standard normal distribution.
The parameters of OK model including β, σ2 and θ are usually estimated by maximizing the likelihood function that
More details about the derivation of OK surrogate can be found in the literature (Ginsbourger et al. 2010) which will not be further introduced.
2.1.2 Generalized EI function
During the EGO process, the best obtained objective design is selected from the training data that Ymin = min{Y}. Then, for any unevaluated design x, the improvement rather than Ymin is defined as
The probability distribution of I(x) can be obtained from (6). Then, the EI method takes the design with the maximal expected value of I(x) as the infilled point
where \( u\left(\mathbf{x}\right)= Ymin-\widehat{y}\left(\mathbf{x}\right) \) and Φ denotes the cumulative distribution function of standard normal distribution.
Note that \( {\widehat{\sigma}}^2 \) is the expected difference between y and \( \widehat{y} \) that
therefore \( \widehat{\sigma}\left(\mathbf{x}\right) \) is also mentioned as uncertainty, and used to estimate the accuracy of \( \widehat{y}\left(\mathbf{x}\right) \). The EI function prefers to select the point with a higher \( \widehat{\sigma} \) or a lower \( \widehat{y} \). The former preference is usually mentioned as global exploration which is meant to improve the accuracy of the surrogate, and the latter one is named as local exploitation which is meant to search the optimum of the surrogate. These two searching preferences can avoid missing the global optimum to a large extent, and they can be further balanced in the GEI function (Schonlau 1998; Sasena et al. 2002) as
where the positive integer g is used to control the balance. If the I g − 1 is considered as the integral weight coefficient of the improvement, one can observe that there is a trade-off in choosing between small improvement with large probability (local exploitation) versus large improvement with small probability (global exploration) (Schonlau 1998). As g grows, the balance will incline to global exploration which is more suitable when the unknown objective is poorly approximated (Sasena et al. 2002), otherwise it will incline to local exploitation.
2.2 Hypervolume based expected improvement
Considering there are M objectives y(x) = {y1(x), y2(x), ⋯yM(x)} to be minimized, and all the sub-objectives can be evaluated by one time simulation. In most cases, the dependency between different sub-objective functions is not available, therefore a standard assumption is usually made that all the sub-objectives are independent to one another (Emmerich et al. 2006). One can use the Kriging surrogate to estimate the probability distribution of each sub-objective separately, and the PDF of y(x) is expressed as
In each iteration of the EGO process for MOO, the current Pareto points P ={Pi| 1 ≤ i ≤ n} are selected from the training data Y. The hypervolume indicator is an outstanding metric for the quality of Pareto points distribution, which is defined as the volume of the region dominated by P and bounded by a reference point r ∈ ℝM. It can be mathematically expressed as
The reference point r should be set with a relative large value to ensure that it is dominated by any true Pareto point. A higher value of hypervolume indicator means a better distribution of Pareto points, therefore the problem of Pareto optimization can be transformed to maximizing H(P). Then, for a given point y ∈ Ω, when it is added to the training data, the hypervolume improvement is defined as the growth of H(P) that
Note that Ihv(y)≥ 0, and Ihv(y)>0 when and only when y is not strongly dominated by any Pareto point in P. As shown in Fig. 1, Ihv(y) is the area of the shaded part. Similar to the classic EI, the HVEI method takes the point with the maximum expected value of Ihv(y) as the new point that
where Ω denotes the non-dominated region
The HVEI is monotonically increasing with increasing \( {\widehat{\sigma}}_i \) (i = 1, 2, ⋯M), therefore it preserves the global exploration of EI method to improve the accuracy of surrogate. And when the accuracy of the surrogate is high (which means \( {\widehat{\sigma}}_i \) is low), the EIhv(x) will regress to Ihv(y) which is directly defined to maximize the hypervolume. Therefore, HVEI is a very effective statistical infill criterion for producing well-distributed Pareto points.
Unfortunately, because of the irregular shape of Ω, there is a lack of direct calculating formula of both Ihv(y) and EIhv(x) when M > 2 (a formula of HVEI for bi-objective problem is developed in Section 3.2), and the high computational complexity of HVEI (which will be further introduced in Section 3.4) has limited its application on the many objective problems.
3 Modified hypervolume based expected improvement
In this section, a new statistical criterion for Pareto optimization will be developed. In order to make it clear, the related concepts are given first in Section 3.1. In Section 3.2, a modified version of hyper volume improvement is proposed and compared with the original one, and then the MHVEI is derived and the whole optimization process is introduced in Section 3.3. In Section 3.4, the properties of MHVEI will be discussed theoretically.
3.1 Related concepts
-
Local upper bounds (LUBs) U: U ⊂ ℝM denotes a set of points which cannot be dominated by one another that ∀Ui ∈ U, ∀ Uj ∈ U ⇒ Ui ⊀ Uj. And it can be used to define the non-dominated region that
-
Grid cells: Using the jth component of each Pi to divide the jth dimension of ℝM into n + 1 intervals, then all the intervals on different dimensions will form (n + 1)M hyper boxes. All the hyper boxes whose lower bounds are inside Ω are defined as the grid cells. In this paper, li = (li, 1, li, 2, ⋯li, M) and ui = (ui, 1, ui, 2, ⋯ui, M) are used to denote the lower and upper bounds of the ith grid cell, where li, j and ui, j denote the bounds corresponding to the jth sub-objective function.
More details of the LUBs can be found in the work of Klamroth et al. (2015) which will not be further introduced. The grid cells have been adopted by Emmerich et al. (2011) to calculate HVEI. Figure 2 shows the relationship between P and U, and the grid cells are denoted by the dotted lines. Note, every LUB is the upper limit of a grid cell.
3.2 Modified hypervolume improvement function
In EGO method, the improvement function should be defined based on the design purpose. For example, maximizing I(x) is the same to minimizing the single objective function, and maximizing Ihv is the same to maximizing the hypervolume. In other words, the improvement function for the MOO problem should be able to produce well distributed Pareto points, when it is used as the single objective function balancing the multiple sub-objectives (like the methods weighted sum and physical programming, see Section 1). Three desired properties of the improvement function for Pareto optimization are listed as follows:
-
Porperty 1.
The improvement function should be monotonically decreasing with increasing yj(j = 1, 2, ⋯M).
-
Porperty 2.
When P ⊂ Ptrue, the improvement function should prefer the point which mostly augments the current Pareto frontier.
-
Porperty 3.
A simple and integrable formula of the improvement function is desired.
Among these properties, Property 1 can ensure the maximum point of improvement function will always locate on Ptrue, Property 2 relates to the quality of the Pareto points distribution, and Property 3 is meant to ensure that the direct calculating formula of the statistical infill criterion exists.
Because maxIhv(y) ⟺ max H(P ∪ y), the Ihv satisfies the first two properties perfectly but fails on the last one. By adopting the concept of the LUBs, the Ihv(y) can be written in the form of
where the first term has considered the volume of the whole region dominated by y, and the Voverlap denotes the volume of the overlapped part in the first item. Note, for a given Ui ∈ U, if y ⊀ Ui, [y, Ui] = ∅ and V([y, Ui]) = 0. As shown in Fig. 2, the area with different shaded lines denotes the region of different [y, Ui] in the first term, and the overlapped shaded area relates to the value of Voverlap.
For the bi-objective problem, every Pareto point dominates two LUBs, then Voverlap(y) can be calculated as
Substituting (18) and (19) into (15), the formula of HVEI can be written as
Unfortunately, (19) dose not hold true when M > 2, therefore the HVEI still lacks a calculating formula. However, if we defined the first term of (18) as an improvement function, it will preserve all the three properties above. This new function is named as the modified version of hypervolume improvement which is given as
and the three properties of \( {I}_{hv}^{mod} \) will be discussed in the following.
Obviously, \( {I}_{hv}^{mod} \) satisfies Property 1 and Property 3 based on its definition, see (21). In order to discuss Property 2 of \( {I}_{hv}^{mod} \), the difference between Ihv and \( {I}_{hv}^{mod} \) will be first analyzed by adopting the concept of grid cells. In the ith grid cell, the part dominated by y can be expressed as
where max{y, li} = (max{y1, li, 1}, max{y2, li, 2}, ⋯ max {yM, li, M}). Then Ihv(y) can be expressed as the summation of dominated region in all the grid cells
The intersection of [y, Uj] and the ith grid cell can be written as
then similar to (23), \( {I}_{hv}^{mod}\left(\mathbf{y}\right) \) can be rewritten as
where gi denotes the number of LUBs strongly dominated by li. By comparing (23) and (25), we can get that when y dominates more LUBs, the difference between Ihv(y) and \( {I}_{hv}^{mod}\left(\mathbf{y}\right) \) will become more obvious. In the contrary, if y dominates only one LUB, \( {I}_{hv}\left(\mathbf{y}\right)={I}_{hv}^{mod}\left(\mathbf{y}\right) \). It implies that Ihv(y) and \( {I}_{hv}^{mod}\left(\mathbf{y}\right) \) will have a more similar landscape when y is closer to the LUBs.
When P ⊂ Ptrue, all the LUBs will be close to Ptrue, then \( {I}_{hv}^{mod}\left(\mathbf{y}\right) \) will have a similar ability of augmenting Pareto frontier as Ihv(y) because of the similar landscape. Note, when P ⊂ Ptrue and M = 2, any point y ∈ Ptrue can strongly dominate only one (at most) LUB, therefore \( {I}_{hv}\left(\mathbf{y}\right)={I}_{hv}^{mod}\left(\mathbf{y}\right) \).It means that \( {I}_{hv}^{mod} \) will work the same as Ihv on augmenting the Pareto frontier for the bi-objective case.
Generally speaking, \( {I}_{hv}^{mod} \) shows better than Ihv on Property 3. And based on the discussion above, \( {I}_{hv}^{mod} \) can also produce well distributed Pareto points (Property 1 and Property 2). The quality of Pareto points obtained by \( {I}_{hv}^{mod} \) and Ihv will be compared with numerical tests in Section 4.1.
3.3 Optimization with MHVEI
Based on (17) and (21), \( {I}_{hv}^{mod}\left(\mathbf{y}\right)\ge 0 \), and \( {I}_{hv}^{mod}\left(\mathbf{y}\right)>0 \) when and only when y ∈ Ω. Then similar to the HVEI, the expected value of \( {I}_{hv}^{mod}\left(\mathbf{y}\right) \) is defined as the MHVEI that
where EI(yj, Ui, j) is the EI function, see (9).
The optimization with the MHVEI follows a typical process of EGO method as shown in Fig. 3. At the very beginning, the initial training data is usually generated with Latin Hypercube method (Mckay et al. 1979). Before the optimization, the initial LUBs are obtained based on initial P and r, and they will be updated with new design y(x∗) in each iteration. An efficient algorithms for generating and updating the LUBs has been proposed by Przybylski et al. (2010), and further improved by Klamroth et al. (2015). In their tests, updating the LUBs with one new point takes no more than 10−5 second when m = 10000 and M = 6, therefore the computational cost can be ignored. The Matlab code of Przybylski’s approach can be found in the Appendix 1.
3.4 Properties of MHVEI
In the following, the performance of MHVEI will be discussed on several desired properties of Pareto infill criterion. All these properties are summarized from the work of Wagner et al. (2010)
-
Global exploration and local exploitation
The statistical infill criterion should be monotonically decreasing with increasing \( {\widehat{y}}_i \), and monotonically increasing with increasing \( {\widehat{\sigma}}_i \) (i = 1, 2, ⋯M). Obviously this property has been preserved in MHVEI, since it is monotonically increasing with increasing EI(yj, Ui, j).
In addition, the balance between global exploration and local exploration is the main difference between HVEI and MHVEI. If we substituting (23) and (25) into (15) and (26) separately, we will get
If gi is considered as the weight coefficient, \( {EI}_{hv}^{mod} \) will be the weighted integral form of EIhv, and \( {EI}_{hv}^{mod} \) will prefer the region which dominates more LUBs comparing with EIhv. Usually, if y dominates more LUBs, it will have a higher Ihv(y) but with a smaller probability of p(y). Then, based on the discussion above, we can conclude that comparing with HVEI, MHVEI prefers the design point with a higher improvement Ihv(y) but a smaller probability of p(y). This difference is similar to the one between GEI functions (Schonlau 1998) with different orders, see Section 2.1. Therefore, we can also conclude that the balance (between the global exploration and local exploitation) in MHVEI more inclines to the global exploration than HVEI.
-
Improving the Pareto set distribution when \( \widehat{\boldsymbol{\upsigma}} \) is small
The HVEI shows outstanding advantage on this property, because it is defined to directly maximize the hypervolume. When \( \widehat{\boldsymbol{\upsigma}} \) is small, the statistical infill criterion will regress to the improvement function. As discussed in Section 3.2, the \( {I}_{hv}^{mod} \) has a very similar ability of producing well distributed Pareto points as Ihv, therefore MHVEI can show a similar performance as HVEI on improving the Pareto set distribution. In order to demonstrate this, the approach of Wagner et al. (2010) is adopted. Note, both MHVEI and HVEI are functions with respect to \( \widehat{\mathbf{y}} \) and \( \widehat{\boldsymbol{\upsigma}} \). The ability of statistical infill criterion on improving the distribution depends on the fitness landscapes with respect to varied \( \widehat{\mathbf{y}} \). Therefore the functions of both criteria with respect to varied \( \widehat{\mathbf{y}} \) and constant \( \widehat{\boldsymbol{\upsigma}} \) (\( \widehat{\boldsymbol{\upsigma}}=\mathrm{0,0.1} \) and 0.2) on bi-objective case are compared in Fig. 4. For each \( \widehat{\boldsymbol{\upsigma}} \), a group of Pareto points with both convex and concave shapes are generated. It can be observed that, although the two criteria refer to very different scale, they have very similar fitness landscapes.
-
Easy to search
The infill criterion should be continuous, differentiable and the plateaus of its fitness landscape should be avoided, then the direct optimizer (especially the gradient based methods) can easily reach the local optima of the infill criterion. Clearly the MHVEI is continuous and differentiable since it has an analytical expression. According to Wagner et al. (2010), HVEI shows strong gradients to their local optima by checking its fitness landscapes, therefore the comparison in Fig. 4 also implies that the MHVEI preserves this property as well as HVEI.
-
Easy to implement and efficient to calculate
The MHVEI is easy to implement, since the direct calculating formula has been given in (26). By observing (26), the computing amount of MHVEI is proportional to M ∗ m. Clearly, m = n + 1 for the bi-objective case, and m is tightly upper bounded by O(n⌊M/2⌋) when M > 2 (Klamroth et al. 2015). Therefore it can be got that the MHVEI has a computational complexity of O(M ∗ n⌊M/2⌋) in the worst case.
In Kriging model, predicting \( {\widehat{\sigma}}_i \) involves a matrix-vector multiply operation, and the computational complexity is O(N2) where N denotes the number of training data. Since P ⊂ Y, n ≤ N. Therefore MHVEI has the same computational complexity as the Kriging model when M ≤ 5 (the computational complexity is O(n2) when M = 5).
The computational complexity of MHVEI are compared with the fastest algorithms (as far as we know) of HVEI in Table 1, where the algorithm for bi-objective case has been developed in Section 3.2, see (20). The algorithms of HVEI for the problem when M = 3 and M > 3 were developed by Yang et al. (2017) and Couckuyt et al. (2014) separately. Note, Couckuyt et al. (2014) have not provided the computational complexity in their work, but it can be estimated as O(M ∗ q2) based on their calculating formula, where q denotes the number of multiple disjoint cells covering the whole region of Ω. Based on (17), it can be deduced that m is the smallest number of cells which can cover the whole Ω, therefore q ≥ m, and the computational complexity will be O(M ∗ n2⌊M/2⌋) in the best case.
Generally speaking, the Table 1 shows that MHVEI has an obvious advantage on the computational complexity comparing with HVEI when M is large, but it is still more time-consuming than the other infill criteria which is not developed based on the hypervolume indicator.
4 Numerical tests
The performance of MHVEI will be investigated with mathematical test functions. The hypervolume indicator is used to measure the obtained Pareto solutions, see (13). All the programs used in this paper are coded with Matlab. The Kriging model is established with DACE tool box (Nielsen et al. 2002). The new design point is obtained by a hybrid strategy of PSO algorithm and gradient based method. To be specific, the PSO is used first to obtain a solution, and then this solution and all the current Pareto designs are used as the initial guesses in the “fmincon” function provided by Matlab. The HSO algorithm (While et al. 2005) is adopted to calculate the hypervolume indicator.
Six test functions from previous literatures are used in this paper. For the bi-objective case, the test functions of ZDT1, ZDT2 and ZDT3 are adopted, which have different shape of Pareto frontier, (convex, concave and discontinue). For the higher M, the test functions of DTLZ1, DTLZ2 and DTLZ7 (Deb et al. 2001) are adopted, whose Pareto frontier are linear, concave and discontinue. The mathematical expression of the test functions can be found in Appendix 2. Note, all the test functions with the same d have the same design variable space of \( \mathbb{X}=\left\{\mathbf{x}|0\le {x}_i\le 1,i=1,2,\cdots d\right\} \). In order to average out the randomness caused by the initial samples, fifty groups of Latin Hypercube Samples are generated for each \( \mathbb{X} \), and the number of initial samples always equals10*d. For all the test functions, it is set that d = M + 1.
4.1 Comparison between improvement functions
The definitions of improvement function are important for the Pareto statistical infill criteria, and three desire properties have been proposed in Section 3.2. In order to further investigate the first two properties (generating well distributed Pareto points) of \( {I}_{hv}^{mod}\left(\mathbf{y}\right) \), \( {I}_{hv}^{mod}\left(\mathbf{y}\right) \) and Ihv(y) will be compared on the DTLZ test functions (M = 3 and 4) in Section 4.1. To be specific, both the improvement functions are directly used to obtain the infill design point that
and
where y(x) denotes the true objective functions. Note, the (29) and (30) can be understood as the MHVEI function and HVEI function with setting \( \widehat{\mathbf{y}}=\mathbf{y} \) and \( \widehat{\boldsymbol{\upsigma}}=0 \). The obtained values of hypervolume are compared in Fig. 5, and the Fig. 6 shows the difference (the hypervolume obtained by Ihv minus the one obtained by \( {I}_{hv}^{mod} \)).
The results show that the Ihv is able to achieve a higher hypervolume than \( {I}_{hv}^{mod} \). This is understandable, since maximizing Ihv is exactly the same to maximizing the hypervolume. However, the two curves shown in Fig. 5 basically overlap with each other, and the advantage of Ihv shown in Fig. 6 can be almost ignored. This comparison proves that \( {I}_{hv}^{mod} \) can produce Pareto points distributed as well as Ihv, and it also implies that the MHVEI works similar as HVEI in the absence of uncertainty.
In addition, Ihv lacks a calculating formula when M > 3, and the computational complexity also grows exponentially with M. Therefore the \( {I}_{hv}^{mod} \) can be used to replace the Ihv in the heuristic intelligence algorithms such as the SMS-EMOA (Beumea and Emmerich 2007), when the computational cost becomes a matter of concern.
4.2 Comparison with other infill criteria
In this section, the MHVEI is compared with three existing infill criteria including HVEI, EI-ELU and LCB. The EI-ELU is developed by Keane (2006) and defined as
where P[Ihv(y) > 0] denotes multiple objective probability improvement, and y∗ is the expected centroid point of y in Ω. The LCB can be understood as the hypervolume improvement of a lower confidence point that \( {I}_{hv}\left(\widehat{\mathbf{y}}-w\widehat{\boldsymbol{\upsigma}}\right) \) where w is the gain factor. In this work, w is set with the value suggested by Emmerich et al. (2006). All these criteria can be easy to implement with a low computational cost on the bi-objective problem, but the computational complexity of both EI-ELU and LCB also grows exponentially with M. With the consideration of computational cost, the MHVEI will be compared with all the three criteria on the ZDT functions (M = 2), and it will be compared with only HVEI on the DTLZ functions (M = 3, 4).
4.2.1 Comparison on ZDT functions
The averaged hypervolume obtained by MHVEI, HVEI, EI-ELU and LCB are compared in Fig. 7, and the results can be discussed with the desired properties mentioned in Section 3.3.
The EI-ELU performs worst among all the four criteria in most cases, which can be attributed to that the EI-ELU is not monotonically decreasing with increasing \( {\widehat{y}}_i \). Comparing with HVEI and MHVEI, the LCB can achieve a similar hypervolume at the end of the optimization, but the hypervolume grows slower during the process. Note, the LCB balances the global exploration and local exploitation roughly in a linear way in the purpose of reducing the computational cost (the LCB can be faster calculate than the HVEI). However the two searching preferences have a strong effect on both the convergence rate and global searching ability of the statistical infill criterion.
The HVEI and MHVEI perform similar on the ZDT functions, and they will be further compared below.
4.2.2 Comparison on DTLZ functions
Four groups of DTLZ functions with different number of sub-objectives and design variables are adopted. The averaged hypervolume and the relative variance are shown and compared in Figs. 8, 9, 10 and 11. Generally speaking, MHVEI performs no worse than HVEI in all the cases, and the advantage of MHVEI is quite obvious on the DTLZ1 and DTLZ2 with d > 4. The reason can be explained by the difference between the two infill criteria. When d is larger, the unknown sub-objectives will be more difficult to approximate by the Kriging model. As demonstrated in Section 3.4, comparing with HVEI, MHVEI prefers the point with higher \( \widehat{\boldsymbol{\upsigma}} \), and it is more helpful to improve the accuracy of the surrogate.
4.3 Investigation on Higher Number of Objectives
Figure 12 shows the Pareto points obtained by MHVEI during the optimization process with the first (of fifty) group of initial samples. It can be observed that the Pareto points keep exploring Ptrue as the optimization continues. However, one can image that when M is larger, much more Pareto points will be needed to cover the higher dimensional space (the dimension of Ptrue is M − 1 in our tests).
In this section, the performance of MHVEI on many objective problem (M ≥ 3) will be investigated. In order to eliminate the effects of dimension on the hypervolume indicator, the normalized hypervolume defined as H(P)/H(Ptrue) is used as the metric to measure Pareto points distribution. Only the test functions of DTLZ1 and DTLZ2 are adopted here, because the H(Ptrue) can be calculated analytically with arbitrary value of M. The optimization process in the first 200 iterations with M = 3, 4,⋯ 7 and d = M + 1 are compared.
Note that H(P)/H(Ptrue) ≤ 1, and H(P)/H(Ptrue) = 1 when and only when P = Ptrue. Figure 13 shows that the algorithm works well at the end of optimization when M is small (M ≤ 4 for DTLZ1, and M ≤ 6 for DTLZ2), but the hypervolume grows slower as M increases. It means that for the higher dimensional (both the design variable space and objective space) problem, more iterations will be needed to improve the distribution of the Pareto points. However, the computational cost might be unacceptable in the real application because of the time consuming evaluation. We believe that the parallel EGO methods (Haftka et al. 2016), which generate multiple new points in one iteration, are more desired for the Pareto optimization rather than the single objective problem.
5 Conclusion
The HVEI function has been proved as an outstanding statistical infill criterion for the multi-objective EGO method. However, the high computational cost has become the bottle neck to limit its application on the many objective optimization. Aiming at this problem, a new statistical infill criterion named MHVEI has been developed. The theoretical study shows that the MHVEI maintains all the desired properties of HVEI, but has a much lower computational complexity. The difference between HVEI and MHVEI is also studied theoretically, and it shows that compared with HVEI, the MHVEI more inclines to the global exploration. The new criterion is compared with several existing ones on Mathematical functions. The results show that the MHVEI and HVEI perform better than the other criteria on bi-objective problem, and MHVEI performs no worse than HVEI on lower dimensional (design variable space) problems but better on the higher dimensional ones.
The results in Section 4.1 show that maximizing the \( {I}_{hv}^{mod} \) can produce as well distributed Pareto points as directly maximizing the hypervolume. Therefore we believe the \( {I}_{hv}^{mod} \) can be used in the heuristic intelligence algorithms for MOO, such as the SMS-EMOA, when the computational cost becomes a matter of concern. In addition, the MHVEI is a statistical infill criterion which is based on the probability estimated by the Kriging surrogate in this work. In theory, this approach can be extended with other Gaussian process models, such as the stochastic radial basis functions.
Some problems have also been observed in our method. Because of the curse of dimension, when the number of design variables is higher, the surrogate will need more training data to properly approximate the objectives. And even if the accuracy of the surrogate is high enough, it still will need more points to explore the higher dimensional true Pareto frontier. Therefore, we believe that the multiple points infill criteria for the parallel computation is desired in the future. However, one question needs to be considered first that should the multiple points explore the multi modal of the infill criterion (like the approach for the single objective problem), or should they try to augment the obtained Pareto solutions on different region? In addition, although the MHVEI has a much lower computational complexity than the existing algorithms of HVEI, it is still more time-consuming than the other criteria which are not developed directly to improve the hypervolume. Therefore more efficient algorithms are still expected.
Abbreviations
- ≺:
-
Pareto dominance. a ≺ b ⟺ ∀ 1 ≤ i ≤ M : ai ≤ bi ∧ ∃ 1 ≤ i ≤ M : ai < bi, where a and b denotes two design points.
- ≺≺:
-
Strong Pareto dominance. a ≺ ≺ b ⟺ ∀ 1 ≤ i ≤ M : ai < bi.
- [l,u]:
-
The region of a hyper box which takes l and u as the lower and upper limits. If l ⊀ u, [l,u]=∅.
- V(A):
-
The hyper volume of the region A. Note V(A) = 0 when A = ∅
- P :
-
Pareto points
- P true :
-
True pareto frontier
- r :
-
Reference point
- U :
-
Local upper bounds
- M :
-
Number of sub-objectives
- d :
-
Number of design variables
- n :
-
Number of points in P
- m :
-
Number of points in U
References
Beumea N, Emmerich M (2007) SMS-EMOA: Multiobjective selection based on dominated hypervolume. Eur J Oper Res 181(3):1653–1669
Couckuyt I, Deschrijver D, Dhaene T (2014) Fast calculation of multiobjective probability of improvement and expected improvement criteria for Pareto optimization. J Glob Optim 60(3):575–594
Das I, Dennis JE (2006) Normal-Boundary Intersection: A New Method for Generating the Pareto Surface in Nonlinear Multicriteria Optimization Problems. SIAM J Optim 8(3):631–657
Deb K, Thiele L, Laumanns M, Zitzler E (2001) Scalable test problems for evolutionary multi-objective optimization. Evolutionary Multiobjective Optimization: Theoretical Advances and Applications Chapter 6: 105-145.
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Emmerich MTM, Beume N, Naujoks B (2005) An EMO Algorithm Using the Hypervolume Measure as Selection Criterion. In: International Conference on Evolutionary Multi-Criterion Optimization, Guanajuato, Mexico.
Emmerich MTM, Giannakoglou KC, Naujoks B (2006) Single- and multiobjective evolutionary optimization assisted by Gaussian random field metamodels. IEEE Trans Evol Comput 10(4):421–439
Emmerich MTM, Deutz AH, Klinkenberg JW (2011) Hypervolume-based expected improvement: Monotonicity properties and exact computation. In: In 2011 IEEE Congress of Evolutionary Computation, New Orleans, LA, USA.
Ginsbourger D, Le Riche R, Carraro L (2010) Kriging is well-suited to parallelize optimization. Springer Berlin Heidelberg
Goel T, Vaidyanathan R, Haftka RT, Wei S, Queipo NV, Tucker K (2007) Response surface approximation of Pareto optimal front in multi-objective optimization. Comput Methods Appl Mech Eng 196(4):879–893
Haftka RT, Villanueva D, Chaudhuri A (2016) Parallel surrogate-assisted global optimization with expensive functions – a survey. Struct Multidisc Optim 54: 3-13.
Hao P, Wang B, Li G (2012) Surrogate-Based Optimum Design for 871 Stiffened Shells with Adaptive Sampling. AIAA J 50(11):2389–2407
Hao P, Wang B, Li G, Meng Z, Wang L (2015) Hybrid Framework for Reliability-Based Design Optimization of Imperfect Stiffened Shells. AIAA J 53(10):2878–2889
Hao, P., Wang, B., Tian, K., Li, G., Sun, Y., & Zhou, C. (2017) Fast procedure for Non-uniform optimum design of stiffened shells under buckling constraint. Structural and Multidisciplinary Optimization 55(4):1503-1516
Jeong S, Obayashi S(2005) Efficient global optimization (EGO) for multi-objective problem and data mining, In: IEEE Congress on Evolutionary Computation, Edinburgh, Scotland.
Jie H, Wu Y, Zhao J, Ding J and Liangliang (2016) An efficient multi-objective PSO algorithm assisted by Kriging metamodel for expensive black-box problems. J Glob Optim 67: 1–25
Jones DR (2001) A Taxonomy of Global Optimization Methods Based on Response Surfaces. J Glob Optim 21(4):345–383
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492
Keane AJ (2006) Statistical Improvement Criteria for Use in Multiobjective Design Optimization. AIAA J 44(4):879–891
Keshtegar B, Hao P, Wang Y, Li Y (2017) Optimum design of aircraft panels based on adaptive dynamic harmony search. Thin-Walled Structures 118: 37–45.
Kim M, Hiroyasu T, Miki M, Watanabe S (2004) SPEA2+: Improving the Performance of the Strength Pareto Evolutionary Algorithm 2. Lect Notes Comput Sci 3242(4):742–751
Klamroth K, Lacour R, Vanderpooten D (2015) On the representation of the search region in multi-objective optimization. Eur J Oper Res 245(3):767–778
Knowles J (2006) ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Trans Evol Comput 10(1):50–66
Knowles JD, Corne DW, Fleischer M(2003) Bounded archiving using the lebesgue measure, In: IEEE Congress on Evolutionary Computation, Canberra, Australia.
Krige DG (1951) A statistical approach to some mine valuations and allied problems at the Witwatersrand. Master’s thesis University of the Witwatersrand
Li M, Li G, Azarm S (2006) A Kriging Metamodel Assisted Multi-Objective Genetic Algorithm for Design Optimization. J Mech Des 130(3):405–414
Liu W, Zhang Q, Tsang E, Liu C and Virginas B (2007) On the Performance of Metamodel Assisted MOEA/D, In: Advances in Computation and Intelligence, Second International Symposium, ISICA 2007, Wuhan, Proceedings, pp 547–557
Mckay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239–245
Messac A (2003) The normalized normal constraint method for generating the Pareto frontier. Struct Multidiscip Optim 25(2):86–98
Messac A, Puemi-Sukam C, Melachrinoudis E (2000) Aggregate Objective Functions and Pareto Frontiers: Required Relationships and Practical Implications. Optim Eng 1(2):171–188
Messac A, Dessel SV, Mullur AA, Maria A (2004) Optimization of large-scale rigidified inflatable structures for housing using physical programming. Struct Multidiscip Optim 26(1–2):139–151
Lophaven, SN, Søndergaard J, Nielsen HB (2002). DACE A Matlab Kriging Toolbox. IMM Informatiocs and Mathematical Modelling. Retrieved from http://imedea.uib-csic.es/master/cambioglobal/Modulo_V_cod101615/Lab/lab_maps/krigging/DACE-krigingsoft/dace/dace.pd
Przybylski A, Gandibleux X, Ehrgott M (2010) A two phase method for multi-objective integer programming and its application to the assignment problem with three objectives. Discret Optim 7(3):149–165
Sasena MJ, Papalambros P, Goovaerts P (2002) Exploration of Metamodeling Sampling Criteria for Constrained Global Optimization. Eng Optim 34(3):263–278
Schonlau M (1998) Computer experiments and global optimization. University of Waterloo, Waterloo, Canada
Sóbester A, Leary SJ, Keane AJ (2005) On the Design of Optimization Strategies Based on Global Response Surface Approximation Models. J Glob Optim 33(1):31–59
Wagner T, EmmerichM, PonweiserW(2010) On expected-improvement criteria for model-based multi-objective optimization. In: International Conference on Parallel Problem Solving From Nature, Krakow, Poland.
While R, Bradstreet L, Barone L and Hingston P (2005) Heuristics for Optimising the Calculation of Hypervolume for Multi-objective Optimisation Problems 3(2): 2225–2232
Yang K, EmmerichM, Deutz A and Fonseca CM(2017) Computing 3-D expected hypervolume improvement and related integrals in asymptotically optimal time. In: International Conference on Evolutionary Multi-Criterion Optimization, Munster, Germany.
Zhan D, Cheng Y, Liu J (2017) Expected Improvement Matrix-based Infill Criteria for Expensive Multiobjective Optimization. IEEE Trans Evol Comput PP(99):1
Zhang Q, Liu W, Tsang E, Virginas B (2010) Expensive Multiobjective Optimization by MOEA/D With Gaussian Process Model. IEEE Trans Evol Comput 14(3):456–474
Zitzler E, Thiele L (1998) Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study. In: 5th International Conference on Parallel Problem Solving from Nature, Amsterdam, Netherlands.
Acknowledgements
This work has been supported by National Natural Science Foundations of China (11702053,11432003, 11602050), National High Technology Research and Development Program of China (2015AA033803), China Postdoctoral Science Foundation (2015 M571298), Fundamental Research Funds for the Central Universities of China (DUT16TD05), Innovation and Talent Recruitment Base on Numerical Simulation and Optimization of Rubber and Plastic Products Forming (B14013), Joint Special Project of Shandong Natural Science Foundation (ZR2017LA007).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: Shapour Azarm
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Code for obtaining and updating LUBs
1.1 Code for updating LUBs
The algorithm first delete the invalid old LUBs on line 12, and then append the new LUBs which are selected from a group of candidates on line 22. The invalid LUBs are the points strongly dominated by p and denoted by SDU in the code, see line 9. Part candidates of the new LUBs are the projection points of p on the boundary of the region bounded by each SDU point, these candidates are denoted by C in the code, see line 15–19. The other part candidates are the old LUBs which are dominated by p but not strongly, and these LUBs are denoted by WDU in the code, see line 10. The new LUBs are the upper bounds of these candidates, which can be understood as the Pareto points selected from the candidates when all the sub-objective are meant to be maximized. The “ParetoSet” on line 21 denotes the function of selecting the Pareto points. For example, the in the EGO process, the Pareto points P are selected from the training data Y, then P=ParetoSet(Y). The code of “ParetoSet” will not be further introduced.
1.2 The code for generating LUBs
When there is no Pareto point, the reference point will be the only LUB, see line7. Then the whole LUBs can be obtained by updating with the Pareto point one by one, see line 9–12.
Appendix 2: Test functions
1.1 ZDT functions
where \( g=1+9{\sum}_{i=1}^d{x}_i/\left(d-1\right) \) in all the three ZDT functions.
1.2 DTLZ functions
where \( {g}_M=1+{\sum}_{j=M}^d{\left({x}_j-0.5\right)}^2 \) in DTLZ1 and DTLZ2 functions. Note that the original DTLZ1 has an extremely large number of local Pareto frontier, which cannot be approximated by the surrogate, therefore, the DTLZ1 function has been modified in (35)
where \( {g}_M=2+9\left({\sum}_{j=M}^d{x}_j\right)/\left(d-M+1\right) \) in DTLZ7 function.
Rights and permissions
About this article
Cite this article
Li, Z., Wang, X., Ruan, S. et al. A modified hypervolume based expected improvement for multi-objective efficient global optimization method. Struct Multidisc Optim 58, 1961–1979 (2018). https://doi.org/10.1007/s00158-018-2006-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00158-018-2006-3