1 Introduction

Multi-objective evolutionary algorithms (MOEAs) have been widely used in real-world applications [1], e.g., software engineering [2, 3], water distribution systems [4], and industrial scheduling [5, 6]. Hence, many MOEAs have been developed to solve the multi-objective optimization problems. They have shown excellent success on the multi-objective problems with two or three objectives, such as non-dominated sorting genetic Algorithm 2 (NSGA-II) [7], strength Pareto evolutionary Algorithm 2 (SPEA2) [8], multiobjective evolutionary algorithm based on decomposition (MOEA/D) [9], and Pareto envelope based selection Algorithm 2 (PESA-II) [10]. However, they suffer from some challenges in solving the multi-objective optimization problems with four or more objectives [1]. Moreover, many optimization problems having a large number of objectives also are commonly seen in real-world applications [11, 12]. Thus, it is not surprising that tackling a large number of objectives has been one of the hot research topics in evolutionary multiobjective optimization(EMO)community during recent years [13].

Many-objective optimization problems(MaOPs) are defined as problems with four or more objectives [1]. MaOPs pose many difficulties to any MOEAs. First and foremost, with the number of objectives increasing, almost all the solutions in the population become nondominated, which would lead to a severe loss of selection pressure that drive solutions toward the Pareto Front (PF). This considerably impedes the evolutionary process [14]. Secondly, in high dimension objective space, the conflict between convergence and diversity becomes deteriorative, since most of the diversity-preservation operators (such as the crowding distance operator [7], or k th nearest distance [8]) prefer selecting the dominance solutions [15]. They cannot enhance the selection pressure toward the PF, and may even put a brake on the evolutionary process to a certain extent. With the increased number of objective space in size, these operators also become a computationally expensive operation. Thirdly, visualization of a high-dimensional front is a challenging task. These difficulties have been highlighted both analytically and experimentally in the early studies [16, 17] on evolutionary many-objective optimization.

To overcome difficulties mentioned above, a variety of algorithms have been proposed, which can be mainly classified into five categories. Firstly, new dominance relation based approaches modify the Pareto dominance to produce fine selection pressure toward PF, such as 𝜃 dominance [18], 𝜖-dominance [19, 20], grid dominance [21], fuzzy Pareto dominance [22, 23], preference order ranking [24], and so on [25,26,27]. These dominance relations extend their dominance area to a certain extent. Secondly, the idea of the decomposition-based approach is that it decomposes an m-objective problem into a series of single-objective subproblems based on aggregation functions and then solves these subproblems simultaneously. The well-known algorithm is the multi-objective evolutionary algorithm based on decomposition(MOEA/D) [9]. There are many other improved versions of MOEA/D, such as MOEA based on both dominance and decomposition (MOEA/DD) [13], improved decomposition-based evolutionary algorithm(I-DBEA) [28]. Thirdly, indicator based approaches introduce a performance indicator as the fitness value to guide the process of evolution. Hypervolume(called S metric) [29], R2 [30] and other indicators [31] have been used widely in MOEAs for many-objective problems. Some common algorithms contain hypervolume estimation algorithm (HypE) [32] S metric selection EMOA (SMS-EMOA) [33], many-objective metaheuristic based on the R2 indicator (MOMBI) [34] and indicator-based evolutionary algorithm(IBEA) [31]. Among these indicators, hypervolume is probably the most popular one due to its good theoretical properties. However, with the increased number of objectives, computational costs become very expensive, which lead to being rarely applied in an application. Fourthly, reference points based MOEAs depend on multiple predefined reference points to make them multi-directional search. This class algorithm mainly has NSGA-III [35], 𝜃 dominance based evolutionary algorithm (𝜃-DEA) [18], clustering-ranking evolutionary algorithm (crEA) [36]. Reference points play a crucial role in maintaining both convergence and diversity. Finally, hybrid technique based methods not only enhance pressure of population toward true PF, but also maintain diversity of population during the evolutionary process. The typical algorithms have Two_Arch2 [37], MOEA/D-U [38] and EFR-RR [38]. Two_Arch2 divides nondominated solutions into two archives, i.e. the convergence archive(CA) and the diversity archive(DA). CA adopting indicator of IBEA [31] promotes convergence, while DA maintains diversity based on another approach. Others introduce different operations into original algorithms to enhance their convergence or diversity. As shown in [39], the performance of an algorithm can be significantly enhanced by adopting a unified approach and borrowing operations from another algorithm. Generally speaking, the main idea of these algorithms is to balance both convergence and diversity.

Although many algorithms have been proposed to solve many-objective optimization problem, existing state-of-the-art many-objective evolutionary algorithms are still not powerful enough [18]. There are still some drawbacks in these algorithms. A new dominance relation based many-objective evolutionary algorithm, called 𝜃-DEA, is proposed to solve the many-objective optimization problem. Although 𝜃-DEA considers both convergence and diversity in the environmental selection process, we can find that it does not demonstrate strong competitiveness for its convergence and diversity on some problems with irregular shape, such as WFG1, WFG2, from [18]. In addition, convergence information in the 𝜃 dominance may be weak, which leads to obtaining poor convergence performance on the problems with regular shape.

Based on the above motivation, we improve the 𝜃-DEA to obtain a solution set with better convergence and even distribution. Our contributions are as following:

  1. (1)

    An enhanced 𝜃-DEA, called 𝜃-EDEA, is proposed through improving 𝜃 dominance to enhance convergence and introducing density based selection mechanism to maintain diversity. Firstly, we construct an m-dimension hyper-plane using extreme point on each objective. Then, the distance from the origin point to the projection of a solution on the reference line in 𝜃 dominance is replaced with the perpendicular distance between the solution and the m-dimension hyper-plane to form the enhanced 𝜃 dominance.

  2. (2)

    Through computational experiments, the results show that using the perpendicular distance is more effective than the distance from the ideal point to the projection of a solution on the reference line, which indicates that the enhanced 𝜃 dominance has stronger convergence performance.

  3. (3)

    The density based selection mechanism is incorporated into the proposed 𝜃-EDEA to maintain diversity in the environment selection phase.

  4. (4)

    Through investigating to parameter 𝜃, the results indicate that when 𝜃 = 5.0, 𝜃-EDEA can achieve the best performance.

  5. (5)

    An extensive comparison between the proposed 𝜃-EDEA and five state-of-the-art algorithms on two well-known test suites is provided in this paper. The results show that 𝜃-EDEA is a very promising algorithm for many-objective optimization.

The rest of the paper is organized as follows. Section 2 briefly presents the concepts of MOO, and definition of 𝜃 dominance. In Section 3, the proposed 𝜃-EDEA is described in detail. The experimental results and discussions are outlined in Section 4. Finally, the conclusions are presented in Section 5.

2 Preliminaries

2.1 Multiobjective optimization problem

Multi-objective optimization problem also is called multi-criteria optimization problem. Generally, a minimize multi-objective optimization problem having n decision variables and m objective variables can be mathematically described as:

$$\begin{array}{@{}rcl@{}} \left. {\begin{array}{l} Minimize\;f(\mathrm{x})=\left( {f_{1} (\mathrm{x}),f_{2} (\mathrm{x}),\mathellipsis ,f_{m} (\mathrm{x})} \right)^{T} \\ Subject\;to\;\mathrm{x}\in \text{X.} \end{array}} \right\} \end{array} $$
(1)

Where x = (x 1,…,x n )T ∈X ⊆ R n is n-dimension decision vector; X represents n-dimension decision space; y = (y 1,…,y m )TYR m is m-dimension objective vector; Y denotes m-dimension objective space. f : XY constitutes m conflicting objective functions, and is a mapping from n-dimensional decision space X to m-dimensional objective space Y.

Based on the above, a few relevant definitions are defined as follows:

Definition 1

(Pareto Dominance) [7, 40]: Let x 1,x 2X,x 1x 2 if and only if

$$\begin{array}{@{}rcl@{}} \forall i\in \left\{ {1,2,\mathellipsis ,m} \right\}&:&f_{i} \left( {\mathbf{x}_{1} } \right)\le f_{i} \left( {\mathbf{x}_{2} } \right)\wedge \exists j\in \left\{ {1,2,\mathellipsis m} \right\}\\ &:&f_{j} \left( {\mathbf{x}_{1}} \right)<f_{j} (\mathbf{x}_{2}) \end{array} $$

Definition 2

(Pareto Optimality) [7, 40]: A vector x X is Pareto optimal if there is no other xX, such that xx

The set of all Pareto optimal points is called the Pareto set (PS). The corresponding shape of Pareto set in objective space forms the Pareto Front (PF).

Definition 3

(Ideal Point): The ideal point z is a vector z = (z1∗,z2∗,…,z m∗)T , where z i∗ is the infimum of each objective i ∈ {1,2,…,m},.

Definition 4

(Nadir Point):The nadir point z nad is a vector \(\mathbf {z}^{nad}=\left ({z_{1}^{nad} ,z_{2}^{nad} ,\mathellipsis ,z_{n}^{nad} } \right )\) , where z i n a d is the supremum of each objective, i ∈ {1,2,…,m}

2.2 𝜃 Dominance relation

𝜃 dominance which uses penalty-based boundary intersection (PBI) approach of MOEA/D is proposed in [18]. 𝜃 dominance works in the normalized objective space, where the origin point is the ideal point. Let λ i be a reference line or weight vector, f(x) be a solution in the objective space. PBI is described as:

$$\mathbf{Min}\;F(\mathbf{x})=d_{i,1} (\mathbf{x})+\theta \cdot d_{i,2} (\mathbf{x}) $$
(2)

Where 𝜃 i s a predefined penalty parameter, i is an index of a reference line. As shown in Fig. 1, d i,1(x) denotes the distance from the origin point to the projection of f(x) on the reference line λ i and d i,2(x) is the perpendicular distance between f(x) and λ i . 𝜃 dominance is defined on a set of solution S t and a predefined set of reference lines Λ 𝜃 dominance relation is determined by two steps. Firstly, each solution in S t is assigned to a reference line to form N different clusters C according to d i,2(x), where N is size of reference lines. Secondly, F(x) of each solution is computed and solutions belonging to the same cluster C i can be compared by F(x) to determine 𝜃 dominance relation. Based on the above, several concepts related to 𝜃 dominance can be defined as follows.

Fig. 1
figure 1

Illustration of distances d i,1(x),d i,2(x)

Definition 5

(𝜃 dominance ) [18]: Let x,yS t , x is said to 𝜃-dominance y, denoted by x 𝜃 y, if and only if xC i ,yC i and F i (x) < F i (y) where i ∈ {1,2,…,N}

Definition 6

(𝜃 optimality ) [18]: A solution x S t is called as 𝜃 optimality iff there is no other solution xS t such that x 𝜃 x

A set of all 𝜃-optimal solutions is referred to as the 𝜃 optimal set (𝜃-OS) and each solution of 𝜃-OS is mapped correspondingly in the objective space to form the 𝜃-optimal front [18].

𝜃 dominance aim to gain nondominated solution set that minimizes the distance to the PF and spreads well over the PF.

figure a

3 The proposed algorithm

3.1 Framework of the proposed 𝜃 -EDEA

Framework of the proposed 𝜃-EDEA is presented in Algorithm 1. Firstly reference points Λ = {λ 1,λ 2,…,λ N },, which are considered as center of clusters for clustering operator and maintain diversity, are generated using Das and Dennis’s method [41]. Then, the initial population P0, with N individuals is randomly produced. The ideal point z is initialized in step 3. Steps 5–15 are iterated in evolution procedure until stopping criteria is met. Step 6 adopts SBX operator with a large distribution index and polynomial mutation to produce offspring population Q t . After union R t of Q t and current population P t , nondominated sorting is used to classify R t into different nondominated levels (F 1, F 2 and so on). Population, \(S_{t} =\cup _{i=1}^{l} F_{i} \), is achieved, where F l satisfies size of S t exceeding N for the first time. Then, S t is applied to update the ideal point. In step 10, an m-dimension hyper-plane is constructed using the extreme points. Clustering operator is employed to split the members in S t into N clusters, C = {C 1,C 2,…,C N }. Ranking operator is manipulated to sort the member in C i in terms of enhanced 𝜃 dominance. This produces different layers, \(F^{\wedge } =\left \{ {F_{1}^{\prime } ,F_{2}^{\prime } ,\mathellipsis ,F_{N}^{\prime } } \right \}\). Finally, the density based selection mechanism is used to select the remaining individuals from \(F_{l}^{\prime } \) to fill slots in P t+1

3.2 Reference points generation

A set of reference points is generated using Das and Dennis’s method [41] that can create a set of uniformly distributed points in the objective space with a predefined integer p which controls the divisions along each axis. In this approach, reference points are sampled from a unit simplex. Reference lines which pass through the origin point and reference points not only are used to maintain diversity of obtained solutions, but also are considered as the center of each cluster produced by clustering operator. The number of reference points (H) can be calculated as

$$ H=\left( {\begin{array}{l} p+m-1 \\ p \end{array}} \right) $$
(3)

From the above formulation, the number of reference points depends on the dimension of objective space m and user-defined integer p. For example, for three objective problem, if p is set to 4, total number of reference points is H = 15. For clarity, the distribution of reference points or reference lines is illustrated in Fig. 2a on a three objective problem.

Fig. 2
figure 2

a Structured reference points in the three-objective problem with p = 4. b Two-layer structured reference points with p 1 = 2 for the boundary layer and p 2 = 1 for the inside layer

When pm, number of reference points will be intermediate. However, for high-dimensional objective problem, even if we use p = m, it would lead to a huge number of reference points, which apparently increases the computational burden. For example, when m = 7, p = 7 will require 1716 reference points. On the other hand, if we simply address this issue by lowering p, it will make all reference points sparsely lie along the boundary of the simplex, which is obviously harmful to the solution diversity. To solve the issue, we utilize two-layered reference points with relatively small values of p as presented in [35]. Let p 1and p 2 denote divisions of boundary layer and inside layer respectively, then number of reference points is computed as:

$$ H=\left( {\begin{array}{l} p_{1} +m-1 \\ p_{1} \end{array}} \right)+\left( {\begin{array}{l} p_{2} +m-1 \\ p_{2} \end{array}} \right) $$
(4)

Figure 2b illustrates the distribution of two-layered reference points, with p 1 = 2 for the boundary layer and p 2 = 1 for the inside layer on a three objective problem.

3.3 Constructing m-dimensional hyper-plane

In the proposed algorithm, we need m extreme points for constructing an m-dimension hyper-plane. Therefore, the extreme points (\({z_{i}^{e}} )\) in the i th objective axis are identified by finding the solution (xS t ) that requires the following achievement scalarizing function minimum.

figure b
$$ASF(\mathbf{x},\mathbf{w}_{\mathbf{j}} )=\overset{m}{\underset{i=1}{\max}} \left\{ {\frac{f_{i} (\mathbf{x})-z_{i}^{\ast } }{w_{j,i} }} \right\} $$
(5)

Where w j = (w j,1,w j,2,…,w j,m ) is the axis direction of the j th objective axis and w j,i subjects to the following condition.

$$ w_{j,i} =\left\{ {\begin{array}{l} 0,j\ne i, \\ 1,j=i. \end{array}} \right. $$
(6)

Note that, 0 should be replaced with a small number 10 −6. Thus, these m extreme points are used to construct an m dimension hyper-plane. Let a = {a 1,a 2,…,a m } denote intercept a i of the i th objective axis. So the equation of the m dimension hyper-plane is expressed as following

$$ \frac{x_{1} }{a_{1} }+\frac{x_{2} }{a_{2} }+{\cdots} +\frac{x_{m} }{a_{m} }=1. $$
(7)

Given a matrix \( A=\left ({{z_{1}^{e}} -z^{\ast },{z_{2}^{e}} -z^{\ast },\mathellipsis ,{z_{m}^{e}} -z^{\ast }} \right )\) and a vector b = (1,1,…,1), a vector x is generated by A x = b. Each intercept a i can be computed by following equation.

$$ a_{i} =\frac{1}{x_{i} }+z_{i}^{\ast } i\in \left\{ {1,2,\mathellipsis ,m} \right\}. $$
(8)

Note that, if all extreme points contain duplicate extreme points, intercepts can be replaced with the maximum value (\(z_{i}^{\max } )\) for each objective axis i = 1,2,...,m. Procedure of calculating intercepts and constituting the hyper-plane using extreme points is showed in Fig. 3 on a three objective problem. The main process of constructing hyper-plane is showed in Algorithm 2.

Fig. 3
figure 3

Constructing the hyper-plane from extreme points and computing intercepts are shown for a three-objective problem

3.4 Clustering operator

Clustering operator is an important step in the proposed algorithm. The clustering operator works in the objective space not in the decision space, and the ideal point is considered as the origin point. However, the clustering operator adapts different reference line as center of cluster so that each solution in S t is associated with a certain reference line.

Let us consider a reference line λ i pass through the origin with a reference point. f(x) presents a solution in the objective space. d i,1(x) is the distance between the ideal point and the projection of f(x) on λ i , d i,2(x) is the perpendicular distance between f(x) and λ i . These are shown in Fig. 4, They can be computed respectively as

$$ d_{i,1} (\mathrm{x})=\left\| {f(\mathrm{x})\lambda_{i} /\left\| {\lambda_{i} } \right\|} \right\| $$
(9)
$$ d_{i,2} (\mathrm{x})=\left\| {f(\mathrm{x})-d_{j,1} (\mathrm{x})(\lambda_{j} /\left\| {\lambda_{j} } \right\|)} \right\|. $$
(10)
Fig. 4
figure 4

Procedure of clustering operator

Procedure of clustering operator is that the solution x with the closest distance (d i,2(x)) to i th reference line can be categorized to the cluster C i , as be described in Algorithm 3.

3.5 Enhanced 𝜃 dominance and ranking operator

Enhanced 𝜃 dominance is similar to 𝜃 dominance in [18], but the difference between them is that d i,1(x) in the 𝜃 dominance is replaced. Enhanced 𝜃 dominance also is defined on the population and the predefined set of reference line that pass through the origin point and reference points. Each solution can be divided into a cluster produced by the clustering operator. Solutions in the same cluster can compare with others using enhanced 𝜃 dominance. In order to illustrate the enhanced 𝜃 dominance, we explain it in two-objective space. Suppose that f(x) = (f 1(x),f 2(x))Tis a solution in a two objective problem, line L is constructed by Algorithm 2. Let d L (x) be the perpendicular distance between f(x) and line L, and d i (x) be the perpendicular distance between f(x) and reference line λ i . These distances are demonstrated in the two-objective space for convex problem in Fig. 5. They can be calculated respectively by (11) and (10)

$$ d_{L} (\mathbf{x})=\frac{\left| {f_{1} (\mathbf{x})/a_{1} +f_{2} (\mathbf{x})/a_{2} -1} \right|}{\sqrt {(1/a_{1} )^{2}+(1/a_{2} )^{2}} } $$
(11)

The larger d L (x) value means better convergence while the smaller d i,2(x) value can maintain better diversity. The F(x) of enhanced 𝜃 dominance is defined as following:

$$ \mathbf{Min}\;\;\;F(\mathbf{x})=\theta \ast d_{i,2} (\mathbf{x})+(-d_{L} (\mathbf{x})) $$
(12)

Where 𝜃 is a supplied penalty parameter. Since d i,2(x) and d L (x) are mutual conflict, d L (x) need be set negative value so that they can simultaneously minimize.

Fig. 5
figure 5

Illustration of d i,2(x) and d L (x)

Definition 7

(enhanced 𝜃 dominance): Let x, y be two solutions. x E y, if only and if xC i ,yC i , and F(x) < F(y), where i ∈{1,2,…,N}

Due to enhanced 𝜃 dominance balancing both convergence and diversity, each solution in S t is pushed fast and evenly toward the Pareto front. Meanwhile, with the extreme points on the each objective updating, line L also is reconstructed, which makes the line L close to Pareto Front.

However, for a concave problem, it is possible to occur that some solutions locate above line L with the evolution running. Figure 6 illustrates the process on the two objective problem. In the early stage of the evolutionary process, all solutions in the population locate below the line L, as shown in Fig. 6a. Then, some solutions possible locate above the line L, as shown in Fig. 6b. In the end stage of the evolutionary process, all solutions in S t move to above the line L, as shown in Fig. 6c. The last two cases lead to the result that d L (x) with smaller value is better. To tackle the two cases, the distance can be amended as follows.

$$ d_{L} (\mathbf{x})\,=\,\left\{ {\begin{array}{l}\!\! d_{L} (\mathbf{x}), \ \mathbf{if}\ f_{1} (x_{1} )/a_{1} \,+\,f_{2} (x_{2} )/a_{2} \,+\,\!f_{3} (x_{3} )/a_{3} \,-\,\!1\!\!\!<\!0 \\ \!\!-d_{L} (\mathbf{x}), \mathbf{otherwise}. \end{array}} \right. $$
(13)
Fig. 6
figure 6

Changes of location between line L and solutions with the evolution process, on the two-objective problem

F(x) of each solution is calculated by (11), and solutions in each cluster are assigned a layer using ranking operator which sorts the solutions in ascending order by their F(x). The solution with smaller F(x) value is front in each cluster. The first solution in each cluster is selected to consist of layer \(F_{1}^{\prime }\). Then we use the same steps to produce F2′,F3′,…,F l′. Finally, we can make each solution be assigned into a layer.

3.6 Density based selection mechanism

To maintain diversity, we compute density of solution based on clustering operator. After clustering operator, each solution is split into a cluster. Density value σ of a solution is defined as number of solution in the cluster to which the solution belongs. Solution with smaller density value is better. The density based selection mechanism firstly selects the lowest layer solution into new population P t+1 until the number of selected solutions equals the population size However, solutions in the last layer are selected based on its density value. The pseudo-code of the selection procedure is given in Algorithm 4

figure d
figure c

3.7 Computational complexity analysis

In the proposed 𝜃-EDEA, the main computational complexity depends on four components: GetParetoNondominationLevels, ConstructH yperplane, Clustering Operator and Ranking Operator from Algorithm 1. Given a MaOP with m objectives and population size N as an example, GetParetoNondominationLevels requires O(mN 2) computations. ConstructHyperplane in Algorithm 2 requires O(mN) computations. Computation of Clustering Operator in Algorithm 3 is same as 𝜃-DEA, requiring O(mN 2). In the enhanced 𝜃 dominance, distance of each x in S t to hyper-plane is computed in O(m) computations. So total complexity is O(m| S t |). The worst case of Ranking Operator requires O(| S t |log| S t |) computations. Since | S t |<2N, total computational complexity of the proposed algorithm is O(mN 2), which is same as that of 𝜃-DEA [18].

4 Experiment and results

This section devotes to the experimental design and result analysis for demonstrating the performance of the proposed algorithm 𝜃-EDEA. We compare the proposed algorithm with five other state-of-the-art algorithms: NSGA-III [35]Footnote 1., GrEA [21].Footnote 2, MOEA/D [9]Footnote 3 HypE [32]Footnote 4 𝜃-DEA [18].Footnote 5 Test problems and performance metrics used in our experiments are briefly introduced in Section 4.1 and Section 4.2 respectively. Parameter setting is given in Section 4.3. Section 4.4 is significance test adopted in the paper. Finally, result and discussion are provided.

4.1 Test problems

Deb–Thiele–Laumanns–Zitzler (DTLZ) [38, 42] and Walking Fish Group (WFG) [43, 44] are two popular scalable test suites for many-objective optimization. To test performance of the proposed algorithm, we use DTLZ1-4and all 9 WFG test functions from three to fifteen objective problems DTLZ1 PF is satisfied with \(\sum \nolimits _{i=1}^{m} {f_{i} (x)=0.5} \) while other DTLZ problems need to satisfy \(\sum \nolimits _{i=1}^{m} {f_{i} (x)^{2}=1.} \) The number of decision variables is set as n = m + r − 1for DTLZ test cases. In our experiment, r is set as 5 for the DTLZ1 and is set as 10 for the DTLZ2-4. For the WFG test suite, according to [18], the number of decision variables suggests to n = 24, where the position-related variable k = m −1 and the distance-related variable l = nk. The features of these problems are described as Table 1.

Table 1 Features of problems

4.2 Performance metrics

To evaluate the solution set obtained by the proposed algorithm, the inverted generational distance (IGD) [41] that is one of the most widely used metric is adopted in our experience. IGD provides a combined information about convergence and diversity of a solution set. To compute IGD value, a set of uniformly distributed points over the true PF is required. However, for many-objective optimization, a larger number of studies failed to point out how they sampled those points along the PF. Since reference points or reference directions are predefined in 𝜃-DEA, NSGA-III and MOEA/D algorithms, respectively and true Pareto-optimal surface of most test problems are known we can locate the intersection point of the Pareto-optimal surface with each reference line. Therefore, according to intersection points, a new way to compute IGD is proposed by Deb and Jain [35]. The way is defined as following

$$ IGD(\mathbf{A},\mathbf{Z})=\frac{1}{\left| {\mathbf{Z}} \right|}\sum\limits_{i=1}^{\left| {\mathbf{Z}} \right|} {\overset{\left| {\mathbf{A}} \right|}{\underset{j=1}{\min}} d(a_{j} ,z_{i} )} $$
(14)

Where d(a j ,z i ) =∥a j z i 2. A presents the set of final nondominated solutions obtained in the objective space. The set Z consists of intersection points of the true PF with each reference line. The set A with smaller IGD values is better.

However, MOEAs without supplying reference points/directions fail to compute the IGD value. To compare with these MOEAs, we adopt another popular metric i.e., hypervolume(HV) [45, 46]. HV denotes the volume of the objective space between the obtained solution set and a reference point and also gives the solution set a comprehensive assessment with respect to convergence and diversity. The compute way can be described as:

$$ HV(A,\mathbf{r})=volume\left( {\underset{f\in A}{\cup} [f_{1} (\mathbf{x}),r_{1} ]\!\times\! \mathellipsis \times [f_{m} (\mathbf{x}),r_{m} ]} \right) $$
(15)

Where A is the set of nondominated solutions in the objective space. r = (r 1,r 2,…,r m ) denotes a reference point in the objective space which is dominated by all points in the set A.

To compute HV value, we need normalize the objective values of solutions in A according to the range of the problem’s Pareto front, due to test problems with different scaling of the search space. As the recommendation in [47], the reference point is set to 1.1 times the upper bound of the Pareto front (i.e., r = 11 m)

In addition, for the problems with no more than ten objectives, we calculate HV exactly using the recently proposed WFG algorithm [29]. For problems with 15 objectives, we approximate the HV by the Monte Carlo simulation proposed in [32], and 10 000 000 sampling points are used to ensure the accuracy.

4.3 Parameter setting

In this section, we set all parameters utilized by these algorithms. For common parameters, we set the same value for the fair comparison. For other parameters, we set the best value that can obtain the best nondominated solution set.

  1. 1)

    Population size: for NSGA-III, 𝜃-DEA, 𝜃-DEA*, MOEA/D algorithms, their population size N is controlled by division p. So we set parameter p to determine population size N. For other algorithms, we adopt a same parameter value to fairly compare. Table 2 lists the population sizes used in this paper for the problem with different objectives.

  2. 2)

    Parameters of Crossover and Mutation: For GrEA, MOEA/D, HypE, SBX probability is pc = 10 and its distribution index is η c = 2 and Polynomial mutation probability is p m = 1/n and its distribution index is η m = 20. As for NSGA-III, 𝜃-DEA, 𝜃 EDEA, the settings are only a bit different according to [29], where η c is set to 30.

  3. 3)

    Stopping Condition and Number of Runs: an algorithm for each run reach to a specified number of generations. Each algorithm in this paper is independently run 20 times on each test instance.

  4. 4)

    Penalty Parameter of MOEA/D-PBI and 𝜃-DEA: 𝜃 = 5.0

  5. 5)

    Neighborhood Size in MOEA/D: T = 20.

  6. 6)

    Grid Division (div) in GrEA: the settings of div are summarized in Table 3

  7. 7)

    Number of Points in Monte Carlo Sampling: it is set to10 000 according to [28].

Table 2 Setting of Population Size
Table 3 Setting of grid division GrEA

4.4 Significance test

To test the different statistical significance we use the Wilcoxon signed-rank test [48,49,50] at the 5% significance level.

4.5 Results and discussion

In this paper, we compare the proposed algorithm with six MOEA algorithms on the DTLZ and WFG test suits. According to two performance metrics IGD and HV comparison results are shown in Tables 46. Then we discuss and analyze the results.

Table 4 Best, median, and worst IGD values obtained for NSGA-IIIMOEA/D-PBI, two versions of 𝜃-DEA and 𝜃-EDEA on M objective DTLZ1 DTLZ2, DTLZ3, and DTLZ4 problems

4.5.1 Comparison with NSGA-IIIMOEA/D and 𝜃 -DEA

In this section, we discuss IGD results on the DTLZ test problems. The way to compute IGD and experimental settings are same as those in the original NSGA-III [35] and 𝜃-DEA study [18]. Since our proposed algorithm, NSGA-III, 𝜃-DEA and MOEA/D all can obtain a same set of intersection point of true PF with reference lines or weight vectors for IGD computation, we only compare these algorithms on three-objective to fifteen-objective DTLZ1-4 problems. 𝜃-DEA that is another version of 𝜃-DEA with normalization and will also be involved in the comparison. Note that the IGD results of NSGAIII, MOEA/D-PBI and two versions of 𝜃-DEA used to compare with those of 𝜃 EDEA are taken from [18, 35].

The results on the four DTLZ test problems are given in Table 4, with both best, median and worst of the IGD values over 20 independent runs being reported for the five compared MOEAs, where the best performance among the five compared algorithms is highlighted in bold. From Table 4, we can find that MOEA/D-PBI performs well on DTLZ2 test problems with all objectives and DTLZ3 test problems with 8, 10, 15 objectives. On the DTLZ3 problem with 8, 10, 15 objectives, MOEA/D-PBI can obtain the smallest IGD value. The DTLZ2problem is a relatively easy problem with a spherical PF. The DTLZ3 problem adds a huge number of local PFs paralleling to the global PF based on DTLZ2, which poses a stiff challenge for algorithms to converge to the global PF. However, MOEA/D-PBI does not work well on DTLZ4 with any number of objectives. The main reason is that DTLZ4 problem introduces a biased density of points along the PF increasing challenge to maintain the diversity in the objective space. MOEA/D-PBI is not able to find a widely distributed set of solutions.

NSGA-III don’t achieve best IGD value on all the test problems. But for all DTLZ1 and DTLZ4 with more than three objectives, NSGA-III obtains smaller IGD value than MOEA/D except for DTLZ1 with five-objective problem. However, for all problems, 𝜃-DEA performs consistently better than NSGA-III. We think that the reason is that 𝜃 dominance used in 𝜃-DEA simultaneously maintain convergence and diversity. Although 𝜃-DEA performs consistently better than NSGA-III on all the problems except DTLZ1 with three, five objectives and DTLZ2 with three objectives, 𝜃-DEA achieved a smaller IGD value than 𝜃-DEA on almost all problems except DTLZ4 with fifteen objectives. For all the DTLZ3 and DTLZ4 problems, 𝜃-DEA obtains a much better IGD value than 𝜃-DEA on DTLZ3, but it obtains similar IGD value as 𝜃-DEA on DTLZ4.

From Table 4, the performance of 𝜃-EDEA is very promising on the four DTLZ test problems with more than three objectives. For all DTLZ1 instances, DTLZ3 with three, five objectives and DTLZ4 with three, five, eight objectives, our proposed 𝜃-EDEA can obtain the smallest IGD value than other algorithms. For all DTLZ2 instances, DTLZ3 and DTLZ4 high objective instances, performance of 𝜃-EDEA is slightly poor than 𝜃-DEA on these instances, while it performs better than 𝜃-DEA on DTLZ2 and DTLZ3 with ten, fifteen objectives. For DTLZ3 and DTLZ4 with more than eight objectives, IGD values obtained by 𝜃-EDEA are a marginally worse than IGD values obtained by 𝜃-DEA . We suspect that the reason is that 𝜃-EDEA sometimes fails to find the extreme point in high-dimensional objective space and constructs a wrong hyper-plane

From the 20 test instances of the DTLZ test suite presented in Table 4, we can find that 𝜃-EDEA achieves 9 the smallest IGD values, while NSGA-III achieves 0, MOEA/D-PBI achieves 6, 𝜃-DEA achieves 1, and 𝜃-DEA achieves 1. From these results, we can conclude that 𝜃-EDEA outperforms NSGA-III, MOEA/D-PBI, 𝜃-DEA , and 𝜃-DEA on DTLZ test problems in terms of IGD. Based on the above comparisons, it can be concluded that the proposed 𝜃-EDEA can generally maintain a good balance between convergence and diversity assisted by structured reference points and hyper-plane.

4.5.2 Comparison with state-of-the-art algorithms

In this section, we compare the proposed 𝜃-EDEA with other algorithms on the DTLZ and WFG test suits in terms of HV value. Tables 56 show the average HV for DTLZ1-4 and WFG1–9 problems over 20 independent runs, where the best average results of HV indicator are shown in boldface. Moreover, we also perform the pair wise comparison in terms of the reported average values of HV indicator between 𝜃-EDEA and other algorithms via the Wilcoxon signed-rank test as shown in Tables 56, where symbol “*” denotes that the outcome is significantly outperformed by 𝜃-EDEA.

Table 5 Average HV for DTLZ1-4 problems
Table 6 Average HV on the WFG1-9 problems

To describe the distribution of obtained solutions on the test instances, we provide illustrations of three-objective DTLZ test instances in low-dimensional objective space and fifteen-objective WFG test instance in high-dimensional objective space. Distributions of final solutions obtained by competitive algorithms in a single run on the DTLZ1 and DTLZ3 with three objectives are shown in Fig. 7 by three-dimensional coordinates, while final solutions in a single run on the WFG6 with fifteen objectives are shown in Fig. 8 by parallel coordinates. It is clearly seen from Figs. 7 and 8 that our proposed 𝜃-EDEA is able to find a solution set with good convergence and diversity.

Fig. 7
figure 7

Pareto-fronts produced by (a) 𝜃-EDEA, (b) 𝜃-DEA,(c) GrEA and (d) HypE on the DTLZ1 and DTLZ3 problems with three objectives

Fig. 8
figure 8

Final solution sets produced by a NSGA-III, b GrEA, c 𝜃-DEA, d MOEA/D, e HypE and f 𝜃-EDEA on the fifteen-objective WFG6 problem, is shown by parallel coordinates

To quantify how well each algorithm performs overall, the performance score [32] used to rank the algorithms is introduced. Suppose there are l algorithms Alg 1 , Alg 2 ..., Alg l involved in the comparison, let δ i,j be 1, if Alg i is significantly better than Alg j according to HV, and 0 otherwise. Then, for each algorithm Alg i , the performance score P(A l g i ) is computed as

$$ P(Alg_{i} )=\sum\limits_{\begin{array}{l} j=1 \\ j\ne i \end{array}}^{l} {\delta_{i,j} } $$
(16)

The performance score demonstrates how many other algorithms significantly outperform the corresponding algorithm on the test case considered. So the algorithm with a lower values is better. Figure 9a demonstrates the average performance score for different test problems and Fig. 9b demonstrates the average performance score for different numbers of objectives.

Fig. 9
figure 9

a Average performance score over all objective dimensions for different test problems, namely DTLZ (Dx) and WFG (Wx). b Average performance score over all test problems for different number of objectives. The smaller the score, the better the PF approximation in terms of HV. The values of 𝜃 -EDEA are connected by a solid line to easier assess the score

Based on the above results, we can obtain some observations for each algorithm. GrEA obtains the best HV value on DTLZ2 DTLZ3, WFG7 and WFG9 with eight objectives, while it was slightly worse on the other problems. From Fig. 8b, GrEA cannot converge the best value on each dimension and distribution of the final solution set obtained by GrEA is dense on the WFG6 with fifteen objectives. But GrEA works better on the three objective DTLZ instances. This is mainly because that GrEA divides each dimension of the objective space into the same number of divisions. So the performance of GrEA is influenced by the parameter div.

NSGA-III shows the closest performance to the proposed 𝜃-EDEA over a widely range of WFG test problems, especially for WGF4 and WFG7, and it also can perform better than 𝜃-DEA. But NSGA-III is slightly worse than 𝜃-DEA on the DTLZ problem from Fig. 9a. It is interesting to find that from Fig. 9b, it takes the second place on the five-objective and fifteen-objective instances.

For solving normalized test problems, MOEA/D achieves satisfactory performance except DTLZ4. However, it does not perform well on the WFG test problems. The main reason may be that MOEA/D don’t employ normalization on the WFG problems. This is why it ranks poorly in Fig. 9.

From Fig. 9b, HypE achieves a good performance on all test problems with three objectives according to HV. Indeed, it wins 7 out of the 13 instances with three objectives. However, it does not show advantages over the other algorithms on instances with more than three objectives except for WFG3 with eight, ten, fifteen objectives. On these instances, HypE obtains the best HV values. HypE need to compute exactly the hypervolume as its fitness values when m < 3, otherwise it computes hypervolume using Monte Carlo simulation to accelerate the speed. So inaccurate fitness value may lead to its poor performance on problems with many objectives. Moreover, HypE does not maintain evenly distributed solution set on all test instances, as shown in Figs. 7d and 8e.

Similar to MOEA/D, 𝜃-DEA without normalization achieves a better performance on DTLZ test problems than NSGA-III, but it does not behave quite well on WFG test problems, especially on WFG2. From Fig. 8c, it does not maintain a good convergence on the fifteen-objective WFG6 problem. However, 𝜃-DEA obtained a good convergence and diversity on three-objective DTLZ test problems, as shown in Fig. 7b. Due to worse performance on WFG test problems, its performance is poorer than NSGA-III on all test problems.

The performance of proposed 𝜃 EDEA is the best than other compared algorithms from Fig. 7 on the two problems. However, it doesn’t show consistently excellent performances on all WFG test problems, especially on WFG3. We suspect that geometrical shapes of WFG3’ PFs are degenerate and linear, which results in that uniform reference points have opposite effect. As shown in [43], all the algorithms with uniform reference points don’t perform well on the WFG3 problem. In general, from all comparison in terms of HV, our proposed 𝜃-EDEA gets a better performance on test problems with more than three objectives. We think that the reason may be that the hyper-plane constructed is nearer and nearer to the true PF with the evolution running, and the closest solution to true PF can be found through the perpendicular distance between the solution and the hyper-plane. From Fig. 8d, solution set obtained by 𝜃-EDEA has a better convergence and diversity than other algorithms. From the test problem, our proposed 𝜃-EDEA achieves the best score on all test problems except for WFG2-3, WFG4 and WFG7 in Fig. 9a. From the number of objective, 𝜃-EDEA shows obvious advantages on the MaOPs, as shown in Fig. 9b except for three objectives.

In summary, 𝜃-EDEA is able to obtain better or similar performance with other compared state-of-the-art algorithms on MaOPs. However, 𝜃-EDEA still doesn’t achieve better performance on the WFG3, due to distinctive features of the problem. HypE has a better performance than 𝜃-EDEA on the problems with three objectives, while it doesn’t maintain better diversity. NSGA-III performs the best on WFG2-4 and WFG7 problems. Compared with 𝜃-DEA, the proposed algorithm in this paper obtains better or similar performance in the all test instances. So our proposed methods can effectively handle many-objective problems.

4.5.3 Distance influence analysis

In this section, we inspect the effectiveness of the perpendicular distance between a solution and the m-dimension hyper-plane in the enhanced 𝜃 dominance. In the 𝜃-DEA and 𝜃-EDEA, 𝜃 takes a value of 0.0, so that they only depend on a single distance during the evolutionary process. From Table 7, we can see that the proposed algorithm can achieve a better performance than 𝜃-DEA on the all test problems in terms of two indicators. 𝜃-EDEA* performs random selection rather than density based selection in the last layer for fair comparison. Therefore, the results indicate that the perpendicular distance between a solution and the m-dimension hyper-plane is more effective in guiding the search.

Table 7 Average IGD and HV values obtained by 𝜃-EDEA (𝜃 = 0.0) and 𝜃-DEA (𝜃 = 0.0) on the DTLZ1-4 problems

4.5.4 Parameter 𝜃 sensitivity analysis

In this section, we investigate the influence of parameter 𝜃 on the performance of the proposed algorithm. Variation of 𝜃 would affect selection of 𝜃-EDEA in the environmental selection phase. So in our experiments, we are to show influence of parameter 𝜃 by running 𝜃-EDEA on the DTLZ test problem in terms of IGD and HV indicators. According to the observation in [18], 𝜃 = 5.0 can achieve better performance on almost all test problems. So we only consider three situations: (1) 𝜃 = 0 (2) 𝜃 = 5.0 (3) 𝜃 = 106

Figures 10 and 11 present how the performance of 𝜃-EDEA varies with different value of parameter 𝜃 on DTLZ1-4 problems according to IGD and HV for 20 independent runs. From these figures, we can find that when 𝜃 is set 5.0, 𝜃-EDEA has a clear advantage on the DTLZ1-3 with all objectives in terms of two indicators: IGD and HV. However, on the only DTLZ4 problem with all objectives, 𝜃 = 5.0 and 𝜃 = 106 achieve a slightly better or similar performance in terms of two indicators. The same observation can also be found in study [18]. In addition, 𝜃 = 0 and 𝜃 = 106 are two extreme cases. In the environmental selection phase of 𝜃-EDEA, if 𝜃 = 106 the search behavior of 𝜃-EDEA is more like NSGA-III. If 𝜃 = 0, the search behavior of 𝜃-EDEA only depends on perpendicular distance between a solution and the hyper-plane and density selection. Therefore, the results show that the proposed method plays a vital role in the search process. And it can be summarized that 𝜃 = 5.0 can achieve a better performance on all test problems

Fig. 10
figure 10

Examination of the influence of 𝜃 on IGD of 𝜃-EDEA for DTLZ1–4 problems with varying number of objectives m. The box plots show the IGD of 20 independent runs each on (1) 𝜃 = 0, (2) 𝜃 = 5.0, (3) 𝜃 = 106. Number of objectives is in brackets

Fig. 11
figure 11

Examination of the influence of 𝜃 on HV of 𝜃-EDEA for DTLZ1–4 problems with varying number of objectives m. The box plots show the HV of 20 independent runs each on (1) 𝜃 = 0, (2) 𝜃 = 5.0, (3) 𝜃 = 106. Number of objectives is in brackets

5 Conclusion and future work

On the MaOPs, one of the challenges is how to balance convergence and diversity. In this paper, we improve 𝜃 dominance which can emphasize both convergence and diversity and propose a new many-objective evolutionary algorithm called 𝜃-EDEA. Our proposed 𝜃-EDEA is expected to enhance the convergence and diversity of 𝜃-DEA in high-dimension objective space by introducing the perpendicular distance between a solution and the hyper-plane constructed using extreme point on each dimension and inheriting clustering operation and ranking operation.

To investigate strong competitiveness of 𝜃-EDEA, we have performed an extensive experimental comparison of 𝜃-EDEA with five state-of-the-art algorithms, which cover grid division, reference point, decomposition, indicator and new dominance relation based approach, on two widely used DTLZ and WFG test suits. In terms of two performance indicators IGD and HV, our proposed 𝜃-EDEA performs significantly better than peer algorithms on most test problems and it is compared favorably with five state-of-the-art many-objective optimizers. To demonstrate clearly this, we also present some visual evidence of superior performance in terms of performance score and diversity. To verify effectiveness of enhanced 𝜃 dominance, we compare it with 𝜃 dominance in the same condition through computational experiments. Moreover, we show the influence of parameter 𝜃 with different value in this paper.

Although many MOEAs have been developed to solve both MOPs and MaOPs, more studies are to be carried out in the future. First, some fundamental studies should be paid more attention. For instance, in single-objective optimization field, the study in [51] emphasizes that one optimization algorithm can be enhanced by another algorithm by understanding an algorithmic linking between them and then borrowing important operators from one to the other. In multi-objective optimization field, the fundamental studies [39, 51] also should be investigated. Second, 𝜃-EDEA should be utilized to solve the constrained optimization problem. Some constraint-handling strategies, such as inverse parabolic confined method and inverse parabolic spread method that they both base on parent-centric and inverse parabolic probability (IP) distribution in [40], can be incorporated to 𝜃-EDEA for solving constrained many-objective optimization problem. Finally, we can introduce a new strategy to 𝜃-EDEA, like model based estimation of distribution method [52, 53], heuristic search method [54, 55], so as to further improve its performance.