1 Introduction

Evolutionary algorithms are particularly suited to solve multi-objective optimization problems (MOPs) because they can find a set of Pareto optimal solutions (POS) approximating the optimal trade-off among conflicting objective functions from the population in a single run of the algorithm (Deb 2001; Coello et al. 2007). MOEA/D, one of the representative multi-objective evolutionary algorithms (MOEAs), decomposes a MOP into a number of sub-problems. In the original MOEA/D (Zhang et al. 2007), each sub-problem is a single-objective optimization problem defined by a scalarizing function using a weight vector. Each scalarizing function is used to search for a search direction in the multi-dimensional objective space. MOEA/D assigns one solution with each scalar optimization and simultaneously optimizes them to approximate the Pareto front by applying genetic operations to neighboring solutions. The decomposition of the objective space in MOEA/D is one of promising approaches for many-objective optimization problems (MaOPs) which optimizes four or more objective functions simultaneously. In MaOPs, it has been known that Pareto dominance-based MOEAs such as the representative NSGA-II (Deb et al. 2002) and SPEA2 (Zitzler et al. 1999a) seriously spoil a proper selection pressure required in the evolution process since most of the solutions become non-dominated and the same rank is assigned to them (Ishibuchi et al. 2008). Consequently, their search performances are deteriorated in MaOPs. On the other hand, MOEA/D uses a scalar value aggregated from multiple objective function values instead of Pareto dominance as the criterion to compare solutions. Therefore, MOEA/D can easily determine the superiority or inferiority of solutions even in MaOPs. Recently, NSGA-III, an improved version of NSGA-II for solving MaOPs, has also introduced the concept of decomposing the objective space (Deb et al. 2013).

After the proposal of the original MOEA/D (Zhang et al. 2007), a number of studies to further improve its search performance have been conducted so far. Here, we classify these studies into four groups. The first group focuses on methods to generate a new solution in MOEA/D framework. The original MOEA/D uses GA employing crossover and mutation to generate a new solution. To enhance the solution search in continuous problems, DE (differential evolution) and PSO (particle swarm optimization) are incorporated in MOEA/D framework (Li et al. 2009; Liu et al. 2010; Martinez et al. 2011; Moubayed et al. 2010). Also, for discrete problems, ACO (ant colony optimization) and SA (simulated annealing) are introduced in the MOEA/D framework (Ke et al. 2013; Li et al. 2011). The second group focuses on ways of decomposition and weight vectors. The original MOEA/D optimizes a number of single objective problems as sub-problems of a MOP. On the other hand, MOEA/D-M2M, an extended MOEA/D, optimizes a number of simple multi-objective problems as sub-problems of a MOP (Liu et al. 2014). MOEA/D-M2M divides the objective space into several sub-regions, assigns a sub-population for each sub-region, and optimizes them simultaneously. Since each sub-problem is a multi-objective problem, MOEA/D-M2M uses non-dominated sorting (Deb et al. 2002) to select good solutions in each sub-population instead of scalarizing functions used in the original MOEA/D. Thus, although there are options to decompose the objective space, this work focuses on scalarizing function based MOEA/D. To scalarize objective values, the original MOEA/D uses uniformly distributed weight vectors without considering characteristics of MOPs. Actually, the appropriate distribution of weight vectors depends on the shape of Pareto front. For this issue, an adaptive control of the weight vector distribution was proposed (Jiang et al. 2011). Also, the original MOEA/D selects parents from T neighborhoods of a weight vector. Since T is an user-defined parameter, T needs to be tuned. In dealing with this issue, an online self-adaptation of T was proposed (Zhao et al. 2012). The third group focuses on parent selection in MOEA/D. In the original MOEA/D, the computational resource to generate new solutions is divided equally among all scalarizing functions. However, generally, each one of the scalar optimization problems has different difficulties. To overcome this shortcoming, methods to control the number of solution generations for each scalarizing function were proposed (Zhang et al. 2009; Chiang et al. 2011). The last group focuses on scalarizing functions. In the original paper of MOEA/D, weighted Tchebycheff (TCH), weighted sum (WS) and penalty-based boundary intersection (PBI) scalarizing functions were introduced (Zhang et al. 2007). To consider different scales among objective values, as an extension of TCH, NBI-style Tchebycheff approach was also introduced (Zhang et al. 2010b). Each scalarizing function has a characteristic effect on the search performance of MOEA/D. As a scalarizing function with new characteristics, a combined scalarizing function of TCH and WS was proposed (Ishibuchi et al. 2010). Also, an idea of adaptively selecting between TCH and WS for each solution was proposed (Ishibuchi et al. 2009). In this work, we also focus on scalarizing functions.

In MOEA/D, each scalarizing function provides a criterion to determine the superiority or inferiority of solutions. MOEA/D decides whether or not to replace a current solution in the population with a newly-generated solution based on their scalarizing function values. Each scalarizing function provides a scenario of multi-objective solution search, and the scalarizing function design makes a large impact on the search performance of MOEA/D. To improve the availability of MOEA/D framework for solving various kinds of MOPs, it is important to provide a new scalarizing function which has different characteristics from the conventional scalarizing functions. In some MOPs, the conventional scalarizing functions face a difficulty to approximate a widely spread Pareto front. In particular, TCH, reciprocal TCH (rTCH) (Li et al. 2013) and PBI scalarizing functions utilize the obtained ideal point \(\varvec{z}\) as the criterion to decompose the objective space and evolve solutions toward \(\varvec{z}\) by minimizing their scalarizing function values. Generally, each element of \(\varvec{z}\) is the best objective function value for each objective function found during the solution search. If the obtained \(\varvec{z}\) is close to the true ideal point \(\varvec{z}^*\) of the Pareto front, we can expect to cover the whole range of the Pareto front. However, if \(\varvec{z}\) is far from \(\varvec{z}^*\), we cannot expect to cover the whole range of Pareto front because the criterion \(\varvec{z}\) to decompose the objective space is unreliable. Therefore, in this work we aim to design a scalarizing function which can approximate a widely spread Pareto front in the objective space. Also, we aim to design a scalarizing function effective for many-objective optimization. A recent study showed that MOEA/D using WS achieved higher search performance than MOEA/Ds using other scalarizing functions in many-objective knapsack problems (Ishibuchi et al. 2013). Based on this report, in this work we design a scalarizing function by considering a characteristic of WS for solving MaOPs.

To improve the spread of solutions in the objective space and the search performance of MOEA/D in multi and MaOPs, in this work we propose the inverted PBI (IPBI) scalarizing function as an extension of the conventional PBI and WS. The conventional PBI evolves solutions toward the obtained ideal point \(\varvec{z}\) by minimizing scalarizing function value. On the other hand, IPBI evolves solutions from the worst objective vector \(\varvec{w}\) in the current population by maximizing scalarizing function value. Each element of \(\varvec{w}\) is set to the worst objective function value for each objective function in the population. IPBI uses the parameter \(\theta \) to strengthen the directivity of the solution search for each search direction determined by a weight vector. IPBI using \(\theta =0\) has an effect similar to the conventional WS. That is, IPBI involves WS achieving high search performance in MaOPs (Ishibuchi et al. 2013) as the case of \(\theta =0\). Furthermore, IPBI can strengthen the directivity of the solution search for each search direction by increasing \(\theta \) from zero.

In this work, we analyze differences between IPBI and conventional scalarizing functions, and compare the search performances of NSGA-III (Deb et al. 2013) and five MOEA/Ds using TCH, rTCH, WS, PBI and IPBI. For the search performance comparison, we use discrete many-objective knapsack problems (Zitzler et al. 1999a) with 2–8 objectives and different sizes of the search space and continuous WFG4 problems (Huband et al. 2006) with 2–8 objectives and different difficulties to obtain a widely spread POS. Also, we observe the robustness of each scalarizing function on Pareto front geometry by using WFG2 problem (Huband et al. 2006) with a discontinuous Pareto front and WFG4 based problems with concave, linear and convex Pareto fronts.

2 Evolutionary multi and many-objective optimization

2.1 Multi and many-objective optimization

A MOP including m kinds of objective functions is defined as follows:

$$\begin{aligned} \left\{ \begin{array}{ll} \text {Minimize/Maximize} &{} {\varvec{f}}({\varvec{x}})=(f_{1}({\varvec{x}}),f_{2}({\varvec{x}}), \ldots ,f_{m}({\varvec{x}}))\\ \text {subject to} &{} {\varvec{x}} \in \mathcal {F} \end{array}\right. \end{aligned}$$
(1)

where, \(\mathcal {F}\) is the feasible solutions and \({\varvec{x}}\in \mathcal {F}\) is a feasible solution vector in the solution space \(\mathcal {S}\,(\mathcal {F} \subseteq \mathcal {S})\), and \(f_i(i=1,2,\ldots ,m)\) are the m objectives to be minimized (or maximized). That is, we try to find a feasible solution vector \({\varvec{x}} \in \mathcal {F}\) in the solution space minimizing (or maximizing) each objective function \(f_{i}\) \((i=1,2,\ldots ,m)\) in a vector fitness function \({\varvec{f}}\). Generally, problems with two or three objective functions are said to be MOPs, and problems with four or more objective functions are said to be MaOPs (Ishibuchi et al. 2008). Important concepts used in determining a set of solutions are dominance, Pareto optimality, Pareto set and Pareto front. Next we define dominance between solutions \({\varvec{x}},{\varvec{y}}\in \mathcal {F}\) in minimization problems as follows: If

$$\begin{aligned} {\forall } i \in \{1,2,\ldots ,m\}:f_{i}({\varvec{x}}) \le f_{i}({\varvec{y}}) \wedge {\exists } i \in \{1,2,\ldots ,m\}:f_{i}({\varvec{x}}) < f_{i}({\varvec{y}}) \end{aligned}$$
(2)

is satisfied, \({\varvec{x}}\) dominates \({\varvec{y}}\). In the following, \({\varvec{x}}\) dominates \({\varvec{y}}\) is denoted by \({\varvec{f}}({\varvec{x}}) \succ {\varvec{f}}({\varvec{y}})\). In the case of maximization problems, the inequalities of Eq. (2) are reversed. A solution vector \({\varvec{x}}\) is said to be Pareto optimal with respect to \(\mathcal {F}\) if it is not dominated by other solution vectors in \(\mathcal {F}\). The presence of multiple objective functions, usually conflicting among them, gives rise to a set of optimal solutions. The set of POS is defined as

$$\begin{aligned} \mathcal {POS}=\left\{ {\varvec{x}}\in \mathcal {F}\,|\,{\lnot \exists }{\varvec{y}}\in \mathcal {F}: {\varvec{f}}({\varvec{y}}) \succ {\varvec{f}}({\varvec{x}}) \right\} , \end{aligned}$$
(3)

and the Pareto front is defined as

$$\begin{aligned} \mathcal {F}ront = \left\{ {\varvec{f}}({\varvec{x}})\,|\,{\varvec{x}} \in \mathcal {POS} \right\} . \end{aligned}$$
(4)

The goal of multi and many-objective optimization is to obtain POS which can approximate the entire true Pareto optimal front. Thus, it is important to simultaneously achieve for the obtained POS the convergence and the diversity toward the true Pareto front.

2.2 Evolutionary multi and many-objective optimization

Evolutionary algorithms are particularly suited for solving MOPs because they evolve simultaneously a population of potential solutions to the problem at hand, which allows us to search a set of Pareto non-dominated optimal solutions in a single run of the algorithm (Deb 2001; Coello et al. 2007). NSGA-II (Deb et al. 2002) and SPEA2 (Zitzler et al. 1999a) have been known as representative MOEAs. In these MOEAs, the selection incorporates elitism and it is biased by Pareto dominance ranking and a diversity preserving strategy in the objective space. Pareto dominance based selection has been successfully applied, especially in two and three objective problems. However, current research reveals that ranking by Pareto dominance loses its effectiveness as the number of objectives is increased, severely deteriorating the search performance of MOEAs for many objectives optimization (Hughes 2005; Aguirre et al. 2006). Therefore, there is a growing interest in applying MOEAs to solve MaOPs which optimize four or more objective functions simultaneously.

To overcome this problem of MOEAs in MaOPs, several approaches have been studied (Ishibuchi et al. 2008). On the basis of conventional MOEAs effective for problems with a small number of objectives, one approach aims to reduce the dimensionality of the objective space, so that conventional MOEAs known to perform well in low dimensional objective space can be utilized (Deb et al. 2006a; Brockhoff et al. 2006; Jaimes et al. 2008). As another approach, user preferences are also being investigated, aiming to focus the solution search on a specific region of Pareto front (Deb et al. 2006b; Auger et al. 2009). Contrary to these approaches, to approximate the entire Pareto front considering all objective functions in MaOPs, algorithms to improve the selection pressure and assign more fine-grained rank to solutions have been designed. They can be categorized into MOEAs based on an extension of Pareto dominance relation (Ishibuchi et al. 2007b; Sato et al. 2010; Fabre et al. 2010; Aguirre et al. 2013), MOEAs based on an indicator (Zitzler et al. 2004; Beume et al. 2007), and MOEAs base on the decomposition of the objective space (Hughes 2005; Ishibuchi et al. 2007a; Zhang et al. 2007; Deb et al. 2013. As one of the promising evolutionary approaches for multi and many-objective optimization, in this work we focus on the last decomposition approach and MOEA/D (Zhang et al. 2007) as a representative MOEA employing this approach.

3 MOEA/D

3.1 Algorithm

MOEA/D decomposes a MOP into a number of single-objective optimization problems. The single-objective optimization problems are formulated by scalarizing functions g using uniformly distributed weight vectors \(\varvec{\lambda }^i\,(i=1,2,\ldots ,N)\). The conventional MOEA/D employs the simplex-lattice design (Scheffé 1958) to generate weight vectors. Each element \(\lambda ^i_j\,(j=1,2,\ldots ,m)\) is one of \(\{0/H, 1/H, \ldots ,\) \(H/H\}\) based on the decomposition parameter H, and \(N=C^{m-1}_{H+m-1}\) kinds of weight vectors satisfying \(\sum _{j=1}^{m}{\lambda ^i_j}=1.0\) are used for solution search. Each weight vector \(\varvec{\lambda }^{i}\) determines an unique search direction in the m-dimensional objective space, and MOEA/D uses a set of uniformly distributed weight vectors to approximate the Pareto front. In the following, the algorithm of MOEA/D (Zhang et al. 2007) is briefly described.

Step (1) initialization:

  • Step (1-1) Compute the Euclidean distances between any two weight vectors and find the T nearest weight vectors to each weight vector. For each \(i\in \{1,2,\ldots ,N\}\), set \(B(i)=\{i_1,i_2,\ldots ,i_T\}\), where \(\varvec{\lambda }^{i_1},\varvec{\lambda }^{i_2},\ldots ,\varvec{\lambda }^{i_T}\) are the T nearest weight vectors to \(\varvec{\lambda }^i\).

  • Step (1-2) Randomly generate the population (\(\varvec{x}^1,\varvec{x}^2,\ldots ,\varvec{x}^N\)).

Step (2) solution search:

For each \(i\in \{1,2,\ldots ,N\}\), perform the following procedure.

  • Step (2-1) Randomly choose two indexes k and l from B(i), and then generate a new offspring \(\varvec{y}\) from parents \(\varvec{x}^k\) and \(\varvec{x}^l\) by applying genetic operators.

  • Step (2-2) For each index \(j \in B(i)\), if \(g(\varvec{y}|\varvec{\lambda }^{j})\) is better than \(g(\varvec{x}^j |\varvec{\lambda }^{j})\), then the current solution \(\varvec{x}^j\) is replaced by the generated offspring \(\varvec{y}\), i.e. \(\varvec{x}^j=\varvec{y}\).

Step (3) stopping criterion:

If the termination criterion is satisfied, then stop and pick POS from the population (\(\varvec{x}^1,\varvec{x}^2,\ldots ,\varvec{x}^N\)) as the output of the optimization. Otherwise, go to Step 2.

3.2 Scalarizing functions

In Step 2-2 of MOEA/D, two solutions are compared based on their scalar values g aggregated from m kinds of objective function values. MOEA/D framework has several options of scalarizing functions (Zhang et al. 2007; Li et al. 2013). In the following, we introduce weighted Tchebycheff, reciprocal weighted Tchebycheff, WS and PBI functions.

3.2.1 Weighted Tchebycheff (TCH)

The scalar optimization of the weighted Tchebycheff function \(g^{tch}\) (Bowman 1976; Miettinen 1999) is formulated as

$$\begin{aligned} \text{ Minimize } g^{tch}(\varvec{x}|\varvec{\lambda }) = \underset{1 \le j \le m}{\max } \{|f_{j}(\varvec{x})-z_{j}|\cdot \lambda _{j}\}. \end{aligned}$$
(5)

This work uses the obtained ideal point \(\varvec{z}\) as the origin to decompose the objective space. Each element \(z_j\,(j=1,2,\ldots ,m)\) is set to the minimum (best) objective function value \(f_j\) obtained during the solution search. In the weighted Tchebycheff approach, a utopian point \(\varvec{z}^u\) consisting of elements smaller than ones of the obtained ideal point (\(z^u_j < z_j\)) is also employed as the origin to decompose the objective space. However, since an appropriate difference between the utopian point and the ideal point depends on problems, this work uses the obtained ideal point \(\varvec{z}\) as the origin. The weighted Tchebycheff approach evolves a solution by minimizing \(g^{tch}\) toward \(\varvec{z}\). The weighted Tchebycheff has an advantage that both convex and concave Pareto front can be approximated.

3.2.2 Reciprocal weighted Tchebycheff (rTCH)

To obtain more uniformly distributed solutions in the objective space, a recent study uses a different form of Tchebycheff approach (Li et al. 2013). The scalar optimization of the reciprocal weighted Tchebycheff function \(g^{rtch}\) is formulated as

$$\begin{aligned} \text{ Minimize } g^{rtch}(\varvec{x}|\varvec{\lambda }) = \underset{1 \le j \le m}{\max } \{|f_{j}(\varvec{x})-z_{j}| / \lambda _{j}\}. \end{aligned}$$
(6)

To avoid division by zero, \(\lambda _j=0\) is exceptionally replaced by \(\lambda _j=10^{-6}\) before the calculation of \(g^{rtch}\).

3.2.3 Weighted sum (WS)

The scalar optimization of the WS function \(g^{ws}\) (Gass and Saaty 1955) is formulated as

$$\begin{aligned} \text{ Maximize } g^{ws}(\varvec{x}|\varvec{\lambda }) = \sum _{j=1}^{m}{\lambda _{j}\cdot (w_j-f_{j}(\varvec{x}))}. \end{aligned}$$
(7)

In this work, each element \(w_j\,(j=1,2,\ldots ,m)\) is set to the worst (maximum) objective function value \(f_j\) in the current population. WS approach evolves a solution by maximizing \(g^{ws}\) from the worst objective vector \(\varvec{w}\). A recent study (Ishibuchi et al. 2013) reported that WS approach achieves higher search performance than TCH in MaOPs. However, WS based MOEA/D has a problem that the entire concave Pareto front cannot be approximated (Zhang et al. 2007).

3.2.4 Penalty-based boundary intersection (PBI)

The scalar optimization of PBI function \(g^{pbi}\) (Zhang et al. 2007) is formulated as

$$\begin{aligned} \text{ Minimize } g^{pbi} (\varvec{x}|\varvec{\lambda })= d_1 + \theta d_2, \end{aligned}$$
(8)

where,

$$\begin{aligned} d_1= & {} \frac{\Vert (\varvec{f}(\varvec{x})-\varvec{z})^T\varvec{\lambda }\Vert }{\Vert \varvec{\lambda }\Vert }, \end{aligned}$$
(9)
$$\begin{aligned} d_2= & {} \left\| \varvec{f}(\varvec{x})-\left( \varvec{z}-d_1\frac{\varvec{\lambda }}{\Vert \varvec{\lambda }\Vert }\right) \right\| . \end{aligned}$$
(10)

PBI also uses the obtained ideal point \(\varvec{z}\) as the criterion to decompose the objective space. \(\theta \) is the parameter of PBI, and its range is \(\theta \ge 0\). Figure 1a shows \(d_1\) and \(d_2\) of a solution \(\varvec{x}\) for a weight vector \(\varvec{\lambda }=(0.5, 0.5)^{\mathrm {T}}\) in \(m=2\) dimensional objective space of a minimization problem. In PBI approach, a solution having a small \(d_1\) is considered as a better solution close to Pareto front. Also, distance \(d_2\) from weight vector \(\varvec{\lambda }\) is considered. Finally, \(g^{pbi}\) is calculated by adding the value of \(d_2\) multiplied by \(\theta \) to \(d_1\). That is, a solution having a small \(d_1\) and \(d_2\) is considered as a better solution. The balance between \(d_1\) and \(d_2\) in \(g^{pbi}\) is controlled by the parameter \(\theta \). Thus, PBI approach evolves a solution toward \(\varvec{z}\) by minimizing \(g^{pbi}\).

Fig. 1
figure 1

Difference between PBI and IPBI. a PBI (Zhang et al. 2007). b Inverted PBI (IPBI)

3.3 Design principles based on characteristics of conventional scalarizing approaches

Each scalarizing approach introduced in the previous section has its own characteristics, and it makes a large impact on the search performance of MOEA/D.

In some MOPs, conventional scalarizing approaches face a difficulty to approximate a wide range of Pareto front. In particular, TCH, rTCH and PBI approaches evolve solutions toward \(\varvec{z}\) in the objective space. Generally, \(\varvec{z}\) is the obtained ideal point found during the solution search. If the obtained \(\varvec{z}\) is close to the true ideal point \(\varvec{z}^*\) of the true Pareto front, we can expect to approximate the entire range of the true Pareto front. However, the distance between the obtained \(\varvec{z}\) and the true \(\varvec{z}^*\) becomes large in some MOPs with a characteristic that solutions in the population tend to be distributed in a specific area of the Pareto front. In this case, since solutions are evolved toward the obtained \(\varvec{z}\) which is far from the true \(\varvec{z}^*\), the entire range of Pareto front cannot be approximated. A way to reduce the problem, a utopian point \(\varvec{z}^u\) is employed as the origin to decompose the objective space. For knapsack problem, Ishibuchi et al. (2015) employed \(z_j^u=1.1\cdot z_j\,(j=1,2,\ldots ,m)\). Since knapsack problem is a maximization problem, the obtained ideal point \(\varvec{z}\) multiplied by 1.1 is used to calculate scalarizing function values. Actually, the spread of solutions in the objective space is improved by employing the utopian point \(\varvec{z}^u\). However, if we use too overestimated \(\varvec{z}^u\) in the case that the obtained \(\varvec{z}\) is close to the true \(\varvec{z}^*\), many solutions are distributed in edge area of Pareto front and the approximation granularity of the central area of Pareto front is deteriorated. The coefficient 1.1 (Ishibuchi et al. 2015) determines the degree of overestimation. However, an appropriate coefficient depends on problems, and it is difficult to find out. To overcome this problem of the conventional scalarizing approaches, the first aim of this work is to design a scalarizing function to approximate a widely spread Pareto front.

Additionally, the second aim of this work is to design an effective scalarizing function for many-objective optimization. A recent study reported that MOEA/D using WS achieved higher search performance than MOEA/Ds using other scalarizing functions especially in MaOPs (Ishibuchi et al. 2013). Based on this report, we design a novel scalarizing function by considering a characteristic of WS approach. Also, we consider to approximate the entire concave Pareto front since the conventional WS cannot approximate the entire concave Pareto front (Zhang et al. 2007).

4 Inverted PBI

4.1 Concept

To approximate a wide range of the Pareto front and improve the search performance of MOEA/D in multi and MaOPs, in this work we propose the inverted PBI (IPBI) function which is an extension of the conventional PBI and WS functions. The conventional PBI evolves solutions toward the obtained ideal point \(\varvec{z}\) by minimizing scalarizing function value. Contrary, IPBI evolves solutions from the worst objective vector \(\varvec{w}\) in the current population by maximizing scalarizing function value. Similar to the conventional PBI, IPBI also uses the parameter \(\theta \). IPBI using \(\theta =0\) has the similar effect to WS. Also, the directivity for the search direction determined by each weight vector is strengthen by increasing \(\theta \) from zero.

4.2 IPBI scalarizing function

The scalar optimization of IPBI function \(g^{ipbi}\) is formulated as

$$\begin{aligned} \text{ Maximize } g^{ipbi} (\varvec{x}|\varvec{\lambda })= d_1 - \theta d_2, \end{aligned}$$
(11)

where,

$$\begin{aligned} d_1= & {} \frac{\Vert (\varvec{w}-\varvec{f}(\varvec{x}))^T\varvec{\lambda }\Vert }{\Vert \varvec{\lambda }\Vert }, \end{aligned}$$
(12)
$$\begin{aligned} d_2= & {} \left\| (\varvec{w}-\varvec{f}(\varvec{x})) - d_1\frac{\varvec{\lambda }}{\Vert \varvec{\lambda }\Vert }\right\| . \end{aligned}$$
(13)

\(\theta \) is the parameter of IPBI, and its range is \(\theta \ge 0\). \(\varvec{w}\) is the worst (maximum) objective vector in the current population. Each element of \(\varvec{w}\) is \(w_j=\max _{i=1}^{N}{f_j(\varvec{x^i})}\) (\(j=1,2,\ldots ,m\)) in the case of minimization problems. The conventional PBI evolves a solution toward the obtained ideal point \(\varvec{z}\) by minimizing \(g^{pbi}\). On the other hand, IPBI evolves a solution from the current worst objective vector \(\varvec{w}\) by maximizing \(g^{ipbi}\). Similar to the example of PBI shown in Fig. 1a. Figure 1b shows \(d_1\) and \(d_2\) of a solution \(\varvec{x}\) for a weight vector \(\varvec{\lambda }=(0.5,0.5)^{\mathrm {T}}\) in the case of IPBI. In IPBI approach, a solution having a large \(d_1\) is considered as a better solution close to the Pareto front. Also, distance \(d_2\) from weight vector \(\varvec{\lambda }\) is considered. Finally, \(g^{ipbi}\) is calculated by subtracting the value of \(d_2\) multiplied by \(\theta \) from \(d_1\). That is, a solution having a large \(d_1\) and a small \(d_2\) is considered as a better solution. Similar to conventional PBI, the balance between \(d_1\) and \(d_2\) in \(g^{ipbi}\) can be controlled by \(\theta \).

Although IPBI is an extension of PBI and WS, their effects on the solution search are quite different. Also, IPBI and the two Tchebycheff approaches have different effects on the search performance. In the following, we discuss their differences by comparing IPBI with PBI, WS and rTCH.

4.3 Comparison with PBI

First, we compare PBI and IPBI. Figure 2 shows contour lines of PBI function \(g^{pbi}\) in a \(m=2\) dimensional minimization problem. Similarly, Fig. 3 shows contour lines of IPBI function \(g^{ipbi}\). For both scalarizing functions, three patterns of contour lines with different \(\varvec{\lambda }\) are shown. In these figure, the ideal point is set to \(\varvec{z}=(0,0)\), and the worst objective vector is set to \(\varvec{w}=(1,1)\). Also, the best point \(\varvec{x}^*\) achieving the best value of \(g^{pbi}\) or \(g^{ipbi}\) is plotted in each figure.

The conventional PBI approach tries to minimize \(g^{pbi}\). As shown in Fig. 2, the shape of contour lines of \(g^{pbi}\) is changed by varying the elements in \(\varvec{\lambda }\) determining the search direction in the objective space. However, \(\varvec{x}^*\) achieving the best (minimum) \(g^{pbi}\) is not changed by varying \(\varvec{\lambda }\), and \(\varvec{x}^*=\varvec{z}\) for any \(\varvec{\lambda }\). That is, the goal points \(\varvec{x}^*\) of \(g^{pbi}\) with any \(\varvec{\lambda }\) are the same point \(\varvec{z}\). Generally, \(\varvec{z}\) is the obtained ideal point found during the solution search. If the obtained \(\varvec{z}\) is close to the true \(\varvec{z}^*\) of the Pareto front, the conventional PBI can approximate a wide range of the Pareto front. Otherwise, the conventional PBI face the difficulty to approximate the entire range of Pareto front because all solutions are evolved toward the single point \(\varvec{z}\) in the objective space even if the obtained \(\varvec{z}\) is far from the true \(\varvec{z}^*\)

IPBI approach tries to maximize \(g^{ipbi}\). As shown in Fig. 3, the shape of contour lines of \(g^{ipbi}\) is changed by varying the elements in \(\varvec{\lambda }\) determining the search direction in the objective space. In IPBI approach, \(\varvec{x}^*\) achieving the best (maximum) \(g^{ipbi}\) is also changed by varying \(\varvec{\lambda }\). That is, the goal points \(\varvec{x}^*\) of \(g^{ipbi}\) with different \(\varvec{\lambda }\) are not the same point and widely distributed in the objective space. Since the goal points \(\varvec{x}^*\) are different depending on \(\varvec{\lambda }\), we can expect that IPBI searches a wide range of the objective space.

Fig. 2
figure 2

Contour lines of PBI using \(\theta =1.0\). a \(\lambda ={(0.2, 0.8)}^\mathrm{T}\), b \(\lambda ={(0.5, 0.5)}^\mathrm{T}\), c \(\lambda ={(0.8, 0.2)}^\mathrm{T}\)

Fig. 3
figure 3

Contour lines of IPBI using \(\theta =1.0\). a \(\lambda ={(0.2, 0.8)}^\mathrm{T}\), b \(\lambda ={(0.5, 0.5)}^\mathrm{T}\), c \(\lambda ={(0.8, 0.2)}^\mathrm{T}\)

Fig. 4
figure 4

Contour lines of WS and IPBI. a \(\lambda ={(0.2, 0.8)}^\mathrm{T}\). b \(\lambda ={(0.5, 0.5)}^\mathrm{T}\). c \(\lambda ={(0.8, 0.2)}^\mathrm{T}\).

4.4 Comparison with WS

Next, we compare WS and IPBI. Figure 4 shows contour lines of WS function \(g^{ws}\) and four IPBI functions \(g^{ipbi}\) with \(\theta =\{0.0, 0.5,1.0, 2.0\}\) in a \(m=2\) dimensional minimization problem. For all functions, three patterns of contour lines with different \(\varvec{\lambda }\) are shown. Also, \(\varvec{n}\) and \(\varvec{w}\) in Fig. 4 are identical to Figs. 2 and 3.

Both WS and IPBI are functions which are maximized from the worst objective vector \(\varvec{w}\). First, we compare the contour lines of WS and IPBI using \(\theta =0.0\). Although their degrees of slope are different, their directions and shapes of slope are the same. Since IPBI using \(\theta =0.0\) does not consider the distance \(d_2\), its effect becomes similar to WS. Next, we can see that the contour lines of IPBI functions are changed by varying \(\theta \). The directivity for the search direction determined by weight vector is strengthen by increasing \(\theta \).

Thus, IPBI using \(\theta =0\) has an effect similar to WS. That is, IPBI involves WS achieving high search performance in MaOPs (Ishibuchi et al. 2013) as the case of \(\theta =0\). Furthermore, IPBI extends the effect of WS to strengthen the directivity for the search direction by increasing \(\theta \) from zero.

Fig. 5
figure 5

Preferred regions of rTCH and IPBI using \(\theta =1.0\). a \(f(x)={(0.8, 0.2)}\), b \(f(x)={(0.5, 0.5)}\), c \(f(x)={(0.2, 0.8)}\)

4.5 Comparison with Tchebycheff

Finally, we compare IPBI with Tchebycheff approach. For a solution \(\varvec{x}\), we compare a preferred region which achieves better scalarizing function value than \(\varvec{x}\) in the objective space. Here, we focus on rTCH and IPBI using \(\theta =1.0\). Figure 5 shows three cases with different three points \(\varvec{f}(\varvec{x})\). Figure 5a shows the case of rTCH using \(\varvec{\lambda }=(0.8,0.2)^{\mathrm {T}}\) and IPBI using \(\varvec{\lambda }=(0.2,0.8)^{\mathrm {T}}\) since the solution \(\varvec{x}\) is on their search directions. Similarly, Fig. 5b shows the case of both rTCH and IPBI using \(\varvec{\lambda }=(0.5, 0.5)^{\mathrm {T}}\), and Fig. 5c shows the case of rTCH using \(\varvec{\lambda }=(0.2,0.8)^{\mathrm {T}}\) and IPBI using \(\varvec{\lambda }=(0.8, 0.2)^{\mathrm {T}}\). The black dash line is the border equal to \(g^{rtch}\) of \(\varvec{x}\), and the gray region is the preferred region for rTCH. Similarly, the red line is the border equal to \(g^{ipbi}\) of \(\varvec{x}\), and the red region is the preferred region for IPBI. MOEA/D replaces the current solution \(\varvec{x}\) by an offspring \(\varvec{y}\) when \(\varvec{y}\) is generated in the preferred region. In each figure, A is the preferred region only for rTCH, B is the one only for IPBI, and C is the one for both rTCH and IPBI.

First, from Fig. 5b, when \(\varvec{x}\) is located in the central region of the objective space, we can see that the preferred regions of both rTCH and IPBI are equivalent. Next, Fig. 5a shows an example that \(\varvec{x}\) has a large \(f_1\) and a small \(f_2\). In this case, we can see that the preferred regions of rTCH and IPBI are different. rTCH replaces the current \(\varvec{x}\) with an offspring \(\varvec{y}\) generated in the gray preferred region (A\(\cup \) C) where both \(f_1\) and \(f_2\) are better than \(\varvec{x}\). On the other hand, IPBI does not replace the current \(\varvec{x}\) with an offspring \(\varvec{y}\) generated in the region A. Thus, IPBI avoids to obtain solutions distributed in the central region of the objective space. Also, IPBI replaces the current \(\varvec{x}\) with an offspring \(\varvec{y}\) generated in the region B where \(f_1\) is worse than \(\varvec{x}\) but \(f_2\) is better than \(\varvec{x}\). That is, for the given search direction emphasizing the minimization of \(f_2\) in Fig. 5a, IPBI prefers an offspring \(\varvec{y}\) rather than the current \(\varvec{x}\) if \(\varvec{y}\) is improved from \(\varvec{x}\) in \(f_2\) even \(\varvec{y}\) is deteriorated from \(\varvec{x}\) in \(f_1\). Thus, IPBI encourages further minimization of \(f_2\) and tries to improve the spread of solutions compared with rTCH. Similarly, in Fig. 5c, IPBI encourages further minimization of \(f_1\). Thus, IPBI tries to maintain the diversity of solutions and improve the spread of solutions in the objective space.

5 Experimental setup

5.1 Six algorithms

We compare search performances of six algorithms on benchmark problems. The first five algorithms are MOEA/Ds using TCH, rTCH, WS, PBI and IPBI, respectively. As a different implementation from the original MOEA/D, for problems with different scaled objective functions, each objective function value is normalized by \(f^n_j(\varvec{x})=(f_j(\varvec{x}) - z_j)/(w_j - z_j)\) before we calculate its scalarizing function value. The last algorithm is NSGA-III (Deb et al. 2013). NSGA-III is an improved version of the representative NSGA-II (Deb et al. 2002). NSGA-III is designed especially for solving MaOPs and also employs the decomposition approach of the objective space. The main difference between NSGA-II and III is the method to maintain the distribution of solutions in the objective space. NSGA-III uses the perpendicular distances between solutions and reference lines based on a predefined set of reference points (weight vectors) instead of the crowding distance used in NSGA-II. The algorithm of NSGA-III in generation t is briefly described in the following.

5.1.1 NSGA-III

The entire population \(\mathcal {R}_t\) is composed of the parent population \(\mathcal {P}_t\) and the offspring population \(\mathcal {Q}_t\). The entire population \(\mathcal {R}_t\) (\(=\mathcal {P}_t \cup \mathcal {Q}_t\)) is classified into several fronts \(\mathcal {F}_1, \mathcal {F}_2, \ldots \) based on the non-dominance levels in the same manner of NSGA-II. Then, the candidate solutions \(\mathcal {S}_t\) is selected by repeating \(\mathcal {S}_t=\mathcal {S}_t \cup \mathcal {F}_i\) and \(i=i+1\) until \(|\mathcal {S}_t| \ge N\). Consequently, the candidate solutions \(\mathcal {S}_t\) includes non-dominated front(s) \(\mathcal {F}_1,\ldots ,\mathcal {F}_l\). If \(|\mathcal {S}_t| = N\), \(\mathcal {S}_t\) is just copied to the parent population \(\mathcal {P}_{t+1}\) for the next generation and no further selections are needed, that is, \(\mathcal {P}_{t+1}=\mathcal {S}_t\). For \(|\mathcal {S}_t| > N\), solutions from \(\mathcal {F}_1\) to \(\mathcal {F}_{l-1}\) are selected, that is, \(\mathcal {P}_{t+1}=\cup _{i=1}^{l-1}{\mathcal {F}_i}\) (\(=\mathcal {S}_t \backslash \mathcal {F}_l\)), and the remaining (\(N-|\mathcal {P}_{t+1}|\)) solutions are selected from the last front \(\mathcal {F}_l\).

To select the remaining solutions from \(\mathcal {F}_l\), the previous NSGA-II uses crowding distance but the latest NSGA-III uses a predefined set of reference points similar to weight vectors in MOEA/D and reference lines passing through the reference points and the origin point. For each solution in \(\mathcal {S}_t\), its perpendicular distances from each of the reference lines are calculated. Then, each solution in \(\mathcal {S}_t\) is associated with the reference line having the shortest perpendicular distance among all reference lines. For each reference line j, the number of solutions associated with the reference line j in \(\mathcal {P}_{t+1}\) (\(=\mathcal {S}_t \backslash \mathcal {F}_l\)) is counted as \(\rho _j\). Next, we randomly select a reference line \(\overline{j}\) from reference lines having the minimum value of \(\rho _j\). If \(\rho _{\overline{j}}=0\) and there exist one or more solutions associated with the reference line \(\overline{j}\) in \(\mathcal {F}_l\), the solution having the shortest perpendicular distance from the reference line \(\overline{j}\) is added to \(\mathcal {P}_{t+1}\), and \(\rho _{\overline{j}}\) is incremented by one. For \(\rho _{\overline{j}} \ge 1\), a randomly chosen solution associated with the reference line \(\overline{j}\), if it exists, from \(\mathcal {F}_l\) is added to \(\mathcal {P}_{t+1}\), and \(\rho _{\overline{j}}\) is incremented by one. The above procedure is repeated until fill up \(\mathcal {P}_{t+1}\). To create the offspring population \(\mathcal {Q}_{t+1}\) for the next generation, NSGA-III randomly picks parents from \(\mathcal {P}_{t+1}\) and applies crossover and mutation operators.

For the normalization of each objective value, the original NSGA-III uses m-dimensional hyper-plane constituted by m-kinds of extreme vectors in \(\mathcal {S}_t\). Since there is the case that the hyper-plane cannot be uniquely determined, in this paper, we simply normalize each objective value by \(f^n_j(\varvec{x})=(f_j(\varvec{x})-z_j)/(\max _{\varvec{y} \in \mathcal {S}_t}{f _j(\varvec{y})} -z_j)\). Therefore, note that the search performance of NSGA-III shown in this paper is not exactly the same as the original one.

5.2 Benchmark problems

As benchmark problems to compare six algorithms, we use many-objective knapsack problem (MOKP) (Zitzler et al. 1999a), WFG4 and WFG2 (Huband et al. 2006).

MOKP is a discrete combinatorial maximization problem, and it has a convex Pareto front. MOKP is formulated by

$$\begin{aligned} \left\{ \begin{array}{ll} \text {Maximize}&{} f_j (\varvec{x}) = \sum \nolimits _{i=1}^{n} x_i \cdot p_{i,j}\\ \text {subject to}&{} g_j (\varvec{x}) = \sum \nolimits _{i=1}^{n} x_i \cdot q_{i,j} \le c_j \end{array}\left( j=1,\ldots ,m \right) ,\right. \end{aligned}$$
(14)

where, \(p_{i,j}\) and \(q_{i,j}\,\left( j=1,2,\cdots ,m \right) \) denote profit and weight of item i according to knapsack (objective) j. \(c_j\) is the capacity of knapsack j. The capacity of knapsack is given by \(c_j=\phi \cdot \sum ^{n}_{i=1}{q_{i,j}}\) based on the feasibility ratio \(\phi \). To use the same implementation of algorithms for both MOKP and WFG4, we solve MOKP as a minimization problem formulated by

$$\begin{aligned} \left\{ \begin{array}{ll} \text {Minimize} &{} f'_j (\varvec{x}) = \sum \nolimits _{i=1}^{n} p_{i,j} - f_j(\varvec{x})\\ \text {subject to} &{} g_j (\varvec{x}) = \sum \nolimits _{i=1}^{n} x_i \cdot q_{i,j} \le c_j \end{array}\left( j=1,\ldots ,m \right) .\right. \end{aligned}$$
(15)

We use MOKPs with \(m=\{2,3,4,5,6,7,8\}\) objectives, \(n=\{250,500,750\}\) items (bits), and feasibility ratio \(\phi =0.5\). To deal with infeasible solutions we use the greedy repair mechanism (Zitzler et al. 1999a). Similar to (Zitzler et al. 1999a), \(p_{i,j}\) and \(q_{i,j}\) are generated as random integers in the interval [10, 100].

WFG4 is a continuous minimization problem, and it has a concave Pareto front. In WFG4, two kinds of difficulties can be controlled by the distance-related parameter L and the position-related parameter K. The difficulty of the convergence toward Pareto front is increased by increasing L. In this work, a fixed \(L=10\) is used. Also, the difficulty to cover the entire Pareto front is increased by increasing K. In this work, we verify the search performance in WFG4 problems with different K. Since assignable values of K in WFG4 is restricted, we set \(K=k+(m-1)-k\mod (m-1)\) by varying \(k=\{10,50,100,200,400\}\). Therefore, the number of variables is \(n=L+K\). Also, the number of objectives is set to \(m=\{2,4,6,8\}\). Also, as a problem with a discontinuous Pareto front, this work uses WFG2. For WFG2, the problem difficulty k is set to 2.

5.3 Metrics

The search performance is evaluated by hypervolume (HV) (Zitzler 1999). HV is m-dimensional volume of the region enclosed by the obtained POS and a dominated point \(\varvec{r}\) in the objective space. A High HV indicates a good POS in both the convergence toward Pareto front and the diversity to approximate a wide range of Pareto front.

To analyze contributing factors to the value of HV, in this work we independently evaluate the diversity and the convergence of the obtained POS in MOKPs. To measure the diversity of the obtained POS, we use maximum spread (MS) (Zitzler 1999). MS is calculated by distances between the minimum and the maximum objective function values of the obtained POS in each objective function. A high MS indicates a high diversity of the obtained POS in the objective space, i.e. a widely spread POS. Also, to measure the convergence of the obtained POS toward Pareto front, we use Norm (Sato et al. 2006). Norm measures the average Euclidean distance between the origin point and each of the obtained POS in the objective space. A High Norm generally means a high convergence toward the true Pareto front. Although Norm cannot precisely reflect local features of the distribution of the obtained POS, we can observe the general tendency of the convergence for POS from Norm.

5.4 Parameters

For discrete MOKPs, we adopt uniform crossover with a crossover rate 1.0 and bit-flipping mutation with a mutation rate 4 / n. The termination criterion is set to \(10^4\) generations. HV is calculated with \(\varvec{r}=(0,0,\ldots ,0)\).

For continuous WFG4 and WFG2 problems, we adopt SBX with a crossover rate 0.8 and distribution parameter \(\eta _c=15\), and polynomial mutation with a mutation rate 4 / n and distribution parameter \(\eta _m=20\). The termination criterion is set to 3, 000 generations. To calculate HV in WFG4 problems, first we normalize the objective function values of solutions by \(f_j'(\varvec{x})=f_j(\varvec{x})/2j\,(j=1,2,\ldots ,m)\), and then we calculate HV with \(\varvec{r}=(1.2,1.2, \ldots ,1.2)\).

The simplex-lattice design (Scheffé 1958) used in the original paper of MOEA/D (Zhang et al. 2007) is employed to generate weight vectors. The decomposition parameters are set to \(H=\{200, 19, 9, 6, 5, 4, 4\}\) for \(m=\{2,3,4,5,6,7,8\}\) objectives, respectively. Therefore, the population sizes are \(N=\{201,210,220,210,252,210,330\}\) for \(m=\{2,3,4,5,6,7,8\}\), respectively. MOEA/Ds and NSGA-III use the same weight vectors and the population size. In the simplex-lattice design with a fixed H, the number of weight vectors (the population size) N is increased in a factorial way with increasing the number of objectives m, and its size becomes not commonly used one in evolutionary algorithms. Therefore, this work uses the minimum H which generates more than 200 weight vectors (solutions) for each objective m. Recently, methods to generate weight vectors especially in MaOPs have been recognized as an important issue in decomposition based MOEAs, and some methods to avoid factorial increase of weight vectors (the population size) and consider their uniform distribution were proposed (Tan et al. 2013; Mohammadi et al. 2014). Since the main focus of this work is scalarizing functions in MOEA/D, the simplex-lattice design is just employed in the following experiments. Note, however, that more appropriate way to generate weight vectors in MaOPs remains as a matter to be studied further. Also, the neighborhood size in MOEA/Ds is set to \(T=20\). In the following experiments, we show the average performance of 30 runs.

6 Experimental results and discussion

6.1 Effects of the parameter \(\theta \) in MOKPs

Both PBI and IPBI use the parameter \(\theta \). In this section, we observe effects of \(\theta \) on the search performance in MOKPs with \(n=500\) items (bits) and \(m=\{2, 4, 6, 8\}\) objectives. Figure 6 shows results of HV obtained by PBI and IPBI at the final generation when the parameter \(\theta \) is variedFootnote 1. Since HV is drastically changed by small values of \(\theta \), the horizontal \(\theta \)-axis is set to logarithmic scale. Essentially, the result of \(\theta =0\) cannot be plotted on logarithmic \(\theta \)-axis. However, since it is important to know the result of \(\theta =0\) to observe the tendency of HV when \(\theta \) is varied, the result of \(\theta =0\) is exceptionally plotted on the leftmost side in each figure. Also, in each figure, we plot HV obtained by WS as a horizontal line.

First, we can see that values of HV obtained by PBI and IPBI using \(\theta =0\) are nearly equal to the one obtained by WS in all MOKPs. This result reveals that IPBI using \(\theta =0\) and WS have a similar effect on the search performance. Next, from the results of PBI, we can see that the optimal parameter maximizing HV is \(\theta ^*=0\) for all objectives m. Since HV is a comprehensive metric considering both the diversity and the convergence of the obtained POS, each of the diversity and the convergence properties of the obtained POS affects value of HV. In the case of PBI, generally, the convergence of solutions toward Pareto front is emphasized by decreasing \(\theta \), and the diversity of solutions is emphasized by increasing \(\theta \). Also, an appropriate \(\theta \) depends on the shape of Pareto front. These multiple factors affect the value of HV, and HV obtained by PBI curves as shown in Fig. 6. In the case of IPBI, HV is improved by increasing \(\theta \) from zero, and IPBI also has the optimal parameters \(\theta ^*=\{0.6, 0.3, 0.2, 0.1\}\) for \(m=\{2,4,6,8\}\) objectives, respectively. Also, we can see that IPBI using \(\theta ^*\) achieves higher HV than WS and PBI using \(\theta ^*\).

Fig. 6
figure 6

HV obtained by PBI and IPBI in MOKPs with \(n=500\) items (bits) when the parameter \(\theta \) is varied. a \(m=2\) objectives, b \(m=4\) objectives, c \(m=6\) objectives, d \(m=8\) objectives

IPBI using \(\theta =0\) has an effect similar to WS which achieves high search performance especially in MaOPs (Ishibuchi et al. 2013), and the directivity for each search direction can be strengthened by increasing \(\theta \) from zero. The results of IPBI in Fig. 6 reveal that the search performance is further improved by controlling \(\theta \) and strengthening the directivity compared with WS.

6.2 Search performance in MOKPs

Here, we compare the search performances of six algorithms in MOKPs. Figure 7 shows results of HV at the final generation in MOKPs with \(n=\{250,500,750\}\) items (bits) when the number of objectives m is varied. For PBI and IPBI, we plot results using the optimal \(\theta ^*\) maximizing HV. For PBI, the optimal parameter is \(\theta ^*=0\) for all MOKPs used in this work. For IPBI, the used optimal parameters \(\theta ^*\) are shown in Table 1. All the results are normalized by the result obtained by TCH.

First, we can see that rTCH achieves higher HV than TCH in MOKPs with a small number of objectives m (\(\le 4\) or 5). However, TCH achieves higher HV than rTCH in MOKPs with a large number of objectives m (\(\ge \) 5 or 6). PBI using \(\theta ^*\) and WS achieve higher HV than TCH, and the relative difference of their HV is increased with the number of objectives m. This result clearly shows the advantage of PBI and WS approaches in many-objective knapsack problems. On the other hand, we can see that IPBI using \(\theta ^*\) achieves the highest HV among the six algorithms in all MOKPs used in this work. Also, we can see that the relative difference of HV between IPBI and TCH is increased with the number of objectives m.

Fig. 7
figure 7

HV in MOKPs when the number of objectives m is varied. a \(n=250\) items (bits), b \(n=500\) items (bits), c \(n=750\) items (bits)

Table 1 Used \(\theta ^*\) in each MOKP for IPBI
Fig. 8
figure 8

MS (diveristy) in MOKPs when the number of objectives m is varied. a \(n=250\) items (bits), b \(n=500\) items (bits), c \(n=750\) items (bits)

To analyze the factors contributing to the improvement of HV in IPBI, we independently evaluate the diversity and the convergence of the obtained POS by using supplemental metrics. To observe the diversity of the obtained POS in the objective space, Fig. 8 shows the results of MS at the final generation. Also, to observe the convergence of the obtained POS, Fig. 9 shows the results of Norm. First, we compare IPBI with PBI and WS since IPBI is extended from PBI and WS. From the results of MS, we can see that IPBI using \(\theta ^*\) shows higher MS than PBI using \(\theta ^*\) and WS in all MOKPs. That is, the spread of solutions in the objective space obtained by IPBI using \(\theta ^*\) is wider than the ones obtained by PBI and WS. On the other hand, from the results of Norm, we can see that values of Norm obtained by IPBI using \(\theta ^*\) are lower than the ones obtained by PBI and WS in MOKPs with \(m\ge 3\) objectives. These results reveal that the improvement of HV in IPBI results from the improvement of the spread in the objective space. Next, we compare IPBI and rTCH. rTCH achieves higher MS than IPBI in MOKPs with \( n=\{500,750\}\) bits and \(m=\{4,5\}\) objectives. However, in these MOKPs, IPBI shows higher Norm than rTCH, and IPBI achieves higher HV than rTCH. NSGA-III shows higher convergence than IPBI in MOKPs with many-objectives. However, the values of MS obtained by NSGA-III are lower than IPBI. Consequently, values of HV become lower than IPBI. This is because MOKPs have the difficulty to obtain a widely spread POS in the objective space.

Fig. 9
figure 9

Norm (convergence) in MOKPs when the number of objectives m is varied. a \(n=250\) items (bits), b \(n=500\) items (bits), c \(n=750\) items (bits)

Fig. 10
figure 10

The obtained populations in MOKP with \(m=2\) objectives and \(n=500\) items

6.3 Obtained population in MOKP

Here, we observe the distribution of the obtained population at the final generation in MOKP with \(m=2\) objectives and \(n=500\) items (bits). Figure 10 shows the true POS (Zitzler et al. 1999b), four populations obtained by WS and IPBIs with \(\theta =\{0.6,0.8,1.0\}\), and their worst objective vectors \(\varvec{w}\). In the case of the true POS, its worst objective vector is the nadir point \(\varvec{w^n}\). Each population is the result of one run of each algorithm. Note that the distribution of the population obtained by WS is similar to the one obtained by IPBI using \(\theta =0.0\).

First, we can see that the population obtained by WS cannot cover the entire Pareto front. To approximate the entire Pareto front, the regions A in Fig. 10 must be searched. Also, since the distribution interval of solutions is large, the approximation of the Pareto front is rough. In the case of IPBI, we can see that the spread of the population in the objective space is significantly improved and elements in \(\varvec{w}\) become small by increasing \(\theta \). Worst objective vectors \(\varvec{w}\) of IPBIs with \(\theta =\{0.6, 0.8, 1.0\}\) are smaller than the nadir point \(\varvec{w^n}\) which is the worst objective vector of the true POS. Actually, there is no true POS in the regions B. However, IPBI can obtain POS in the regions A by searching the regions B with \(\varvec{w}\) smaller than \(\varvec{w^n}\). In this MOKP, the highest HV is achieved by IPBI using \(\theta ^*=0.6\). IPBIs using \(\theta >0.6\) search a wider range of the objective space even if the range is dominated region by other solutions in the population. This feature has an advantage to improve the spread of solutions in the objective space, and it is effective especially in problems with difficulty to approximate widely spread Pareto front. However, there is a negative effect that the number of non-dominated solutions in the population is decreased by increasing \(\theta \), the approximation granularity of Pareto front becomes low, and HV is deteriorated as shown in Fig. 6a. This work uses \(\varvec{w}\) as the origin point to decompose the objective space. However, there is a possibility that the approximation granularity of Pareto front is further improved by employing another origin point between two worst objective vectors of the obtained population and non-dominated solutions. Also, in the case of IPBI, since the distribution interval of solutions becomes short by increasing \(\theta \) and strengthening the directivity for each search direction, the approximation of the Pareto front is finer than WS.

This result reveals that IPBI is able to approximate a widely spread Pareto front by searching a wide range of the objective space which cannot be searched by WS.

6.4 Effects of the parameter \(\theta \) in WFG4s

Next, we discuss the search performances on WFG4 problems with different number of objectives m and problem difficulty parameters k. In order to briefly know the impact of the problem difficulty parameter k in WFG4 problems, Fig. 11 shows six sets of POS obtained by TCH based MOEA/D in \(m=2\) dimensional WFG4 problems with \(k=\{10,50,100, 200,300,400\}\) at the final generation. From the results, we can see that the difficulty to obtain a widely spread POS is increased with the parameter k in WFG4 problems.

Here, we observe effects of \(\theta \) on the search performance in WFG4 problems. Figs. 12, 13, and 14 show results of HV obtained by PBI and IPBI in WFG4 problems with problems difficulties \(k=\{100,200,400\}\) when the parameter \(\theta \) is varied Footnote 2. In each figure, the horizontal \(\theta \)-axis is logarithmic, the result of \(\theta =0\) is exceptionally plotted on the leftmost side, and HV achieved by WS is also plotted as the horizontal line.

First, we can see that the values of HV obtained by PBI and IPBI using \(\theta =0\) are close to the one obtained by WS. In the case of IPBI, HV is improved with increasing \(\theta \) from zero, and there is the optimal parameter \(\theta ^*\) to maximize HV in each problem. On the other hand, in the case of PBI, HV is deteriorated by increasing \(\theta \) from zero to one. Also, the further increase of \(\theta \) from one improves HV, and the optimal parameter maximizing HV becomes \(\theta ^*>1\) in almost all problems except the two problems with \(k=400\) and \(m=\{2, 8\}\). In the problem with \(m=2\) and \(k=400\), as shown in Fig. 14a, although HV is significantly improved by increasing \(\theta \) from one, its peak value is lower than the case with \(\theta =0\). On the other hand, in the problem with \(m=8\) and \(k=400\), as shown in Fig. 14d, significant improvement of HV by increasing \(\theta \) cannot be observed. That is, the behavior of HV obtained by PBI in Fig. 14d is different from others.

Fig. 11
figure 11

Six sets of the obtained POS in WFG4 problems with different k

Fig. 12
figure 12

HV obtained by PBI and IPBI in WFG4 problems with \(k=100\) when the parameter \(\theta \) is varied. a \(m=2\) objectives, b \(m=4\) objectives, c \(m=6\) objectives, d \(m=8\) objectives

Fig. 13
figure 13

HV obtained by PBI and IPBI in WFG4 problems with \(k=200\) when the parameter \(\theta \) is varied. a \(m=2\) objectives, b \(m=4\) objectives, c \(m=6\) objectives, d \(m=8\) objectives

Fig. 14
figure 14

HV obtained by PBI and IPBI in WFG4 problems with \(k=400\) when the parameter \(\theta \) is varied. a \(m=2\) objectives, b \(m=4\) objectives, c \(m=6\) objectives, d \(m=8\) objectives

To clarify the different behaviors of HV achieved by PBI, we focus on Fig. 12d in the problem with small \(k=100\) and Fig. 14d in the problem with large \(k=400\), and compare three PBIs using \(\theta =\{0,1,100\}\) marked as A, B and C in each figure.

In both Figs. 12d and 14d, HV achieved by PBI is decreased by increasing \(\theta \) from zero to one (from A to B). When \(\theta \) is increased from one to 100 (from B to C), HV is improved in Fig. 12d but HV is not improved in Fig. 14d. In the case of Fig. 14d, HV is not improved even when \(\theta \) is increased to a very large 1,000. Consequently, in Fig. 14d, the optimal parameter maximizing HV becomes \(\theta ^*=0\) which searches limited edge areas of concave Pareto front. These results reveal that enough search performance cannot be achieved by PBI in problems with many objectives and the difficulty to obtain a widely spread Pareto front in the objective space. In the following, we further analyze the solution behaviors of three cases of PBI (A, B and C) in Figs. 12d and 14d.

We use a metric \(D^z\) to measure the accuracy of the obtained ideal point \(\varvec{z}\). \(D^z\) is a distance between the true ideal point \(\varvec{z}^*=\{0,0,\ldots ,0\}\) and the obtained ideal point \(\varvec{z}\) in the objective space. \(D^z\) is calculated by

$$\begin{aligned} D^{z}=\sqrt{\sum ^{m}_{j=1}\left( z_j/2j\right) ^2}, \end{aligned}$$
(16)

where, \(z_j\) (\(j=1,2,\ldots ,m\)) is j-th element of the obtained \(\varvec{z}\). As shown in Fig. 11, since the scales of each objective function value are different in WFG4 problems, \(z_j\) is divided by 2j to normalize each objective value. A small \(D^{z}\) indicates that a good \(\varvec{z}\) close to the true \(\varvec{z}^*\) in the objective space, and the accuracy of the obtained \(\varvec{z}\) is high. Figure 15a shows transitions of \(D^z\) achieved by three cases of PBI (A using \(\theta =0\), B using \(\theta =1\) and C using \(\theta =100\)) corresponding to Fig. 12d in WFG4 problem with small \(k=100\). Similarly, Fig. 15b shows \(D^z\) achieved by three PBIs corresponding to Fig. 14d in problem with large \(k=400\).

From the results in both Fig. 15a and b, we can see that PBI using \(\theta =0\) (A) gradually decreases \(D^z\) and improves the accuracy of the obtained ideal point \(\varvec{z}\). This is because the search area of PBI using \(\theta =0\) is limited in the edge areas of concave Pareto front in WFG4 problems. That is, HV achieved by PBI using \(\theta =0\) in Figs. 12, 13, and 14 is due to highly-accurate \(\varvec{z}\), and the approximation of the central area of concave Pareto front cannot be expected. To approximate the entire concave Pareto front by using PBI, the directivity of PBI function should be strengthen by setting \(\theta >0\). However, in the case of PBI using \(\theta =1\) (B), we can see that the decrease of \(D^z\) is stopped in the early generations. That is, the population is distributed in a specific area of the objective space and the evolution of solutions is stagnated. Consequently, as shown in Figs. 12, 13, and 14, HV obtained by PBI using \(\theta =1\) is low. Next, we observe \(D^z\) of PBI using \(\theta =100\) (C). From the result of Fig. 15a in the problem with small \(k=100\), we can see that the decrease of \(D^z\) is slightly stagnated during a period from around 20 to 600 generations. After that, \(D^z\) is gradually decreased. Thus, since the approximation accuracy of \(\varvec{z}\) is high and solutions approximating the central area of Pareto front can be obtained by setting a large \(\theta \), PBI using \(\theta =100\) (C) achieves higher HV than PBIs using \(\theta =\{0,1\}\) (A and B) in Fig. 12d. On the other hand, from the results of Fig. 15b in the problem with large \(k=400\), \(D^z\) achieved by PBI using \(\theta =100\) (C) is not decreased and its behavior is similar to PBI using \(\theta =1\) (B). Even if larger \(\theta =1,000\) is used, this tendency is not changed. That is, the conventional PBI faces difficulties to achieve a high HV in problems with \(m=8\) objectives and a large k even when a large \(\theta \) is used.

Fig. 15
figure 15

Transitions of \(D^z\) obtained by three PBIs in WFG4 problems with \(m=8\) objectives. a WFG4 with small \(k=100\), b WFG4 with large \(k=400\)

6.5 Search performance in WFG4s

Finally, we compare the search performances of the six algorithms in WFG4 problems. Figure 16 shows the results of HV at the final generation when the problem difficulty parameter k is varied. The difficulty to obtain a widely spread POS is increased with k. For PBI and IPBI, we plot the results using the optimal \(\theta ^*\) maximizing HV. The used parameters \(\theta ^*\) for PBI and IPBI are shown in Tables 2 and 3, respectively.

First we discuss the results in problems with \(m=2\) objectives. We can see that WS shows the lowest HV among the six algorithms except the case with \(k=400\). This is because solutions evolved by WS are distributed only in the extreme regions of concave Pareto front and cannot cover its central region. Also, we can see a tendency that values of HV obtained by other five algorithms are deteriorated by increasing the problem difficulty k. In problems with \(k\ge 50\), IPBI achieves the highest HV among the six algorithms. Thus, IPBI achieves higher search performance than others in problems with difficulty to approximate a widely spread Pareto front.

Fig. 16
figure 16

Results of HV in WFG4 problems. a \(m=2\) objectives, b \(m=4\) objectives, c \(m=6\) objectives, d \(m=8\) objectives

Table 2 Used \(\theta ^*\) in each WFG4 problem for PBI
Table 3 Used \(\theta ^*\) in each WFG4 problem for IPBI

Next, from the results in \(m=4\) objective problems, we can see that IPBI using \(\theta ^*\) achieves the highest HV among the six algorithms in almost all problems except the problem with \(k=10\). In \(m=6\) objective problems with \(k \le 100\), PBI using \(\theta ^*\) shows the highest HV. However, in problems with \(k > 100\), IPBI using \(\theta ^*\) achieves the highest HV. In \(m=8\) objective problem with \(k=10\), we can see that NSGA-III shows the highest HV since it is especially designed for solving MaOPs. In problems with \(k\ge 200\), we can see that IPBI using \(\theta ^*\) achieves the highest HV. Thus, IPBI shows high search performance in problems with the difficulty to obtain a widely spread POS.

6.6 Robustness on pareto front geometry

Finally, we verify the robustness of each scalarizing function against different shapes of Pareto front. To visually observe obtained solutions in the objective space, in this section we use \(m=2\) dimensional problems and focus on concave, linear, convex and discontinuous Pareto fronts. For concave Pareto front, we use the conventional WFG4. For linear and convex Pareto fronts, we use a transformed WFG4 problem (tWFG4). The objective function values in tWFG4 are calculated by

$$\begin{aligned} f^\text {tWFG4}_i(\varvec{x})= 2i \cdot \left( \frac{f^\text {WFG4}_i(\varvec{x})}{2i}\right) ^\alpha \quad (i=1,2,\ldots ,m), \end{aligned}$$
(17)

where, \(f^\text {WFG4}_i\) is i-th objective function of the conventional WFG4, and \(\alpha \) is the parameter to control shape of Pareto front. In the case of \(\alpha =1\), the shape of Pareto front is equivalent to the one of the conventional WFG4, and it is concave. In this section, for linear and convex Pareto fronts, we use two tWFG4 problems with \(\alpha =\{2, 4\}\), respectively. For these WFG4 based problems, the difficulty parameter \(k=10\) is used in this section. Also, for discontinuous Pareto front, we use WFG2.

Figures 17, 18, 19, and 20 show sets of non-dominated solutions and populations obtained by each scalarizing function in four problems with different shapes of Pareto front. Each figure shows the result of one run, and the true POS are also plotted. Since the distributions of solutions obtained by TCH and rTCH are similar, this section only shows the results obtained by TCH. For all problems, PBI uses \(\theta =5\), and IPBI uses \(\theta =10\). Note that these \(\theta \) are not the best \(\theta ^*\) for all problems since the appropriate \(\theta ^*\) depends on problems.

Fig. 17
figure 17

Solutions obtained by TCH in problems with four different shapes of Pareto front. a Concave, b linear, c convex, d discontinuous

Fig. 18
figure 18

Solutions obtained by WS in problems with four different shapes of Pareto front. a Concave, b linear, c convex, d discontinuous

Fig. 19
figure 19

Solutions obtained by PBI (\(\theta = 5\)) in problems with four different shapes of Pareto front. a Concave, b linear, c convex, d discontinuous

Fig. 20
figure 20

Solutions obtained by IPBI (\(\theta = 10\)) in problems with four different shapes of Pareto front. a Concave, b linear, c convex, d discontinuous

From the results in Fig. 17, we can see that TCH is able to approximate four different shapes of Pareto front. However, TCH cannot obtain solutions in the region A of the discontinuous Pareto front in Fig. 17d. Among total 30 runs of MOEA/D using TCH, solutions were not obtained in the region A even once. Next, Fig. 18 shows results of WS. It has been known that WS cannot approximate the entire concave Pareto front (Zhang et al. 2007). As shown in Fig. 18, although WS can approximate the entire convex Pareto front, it faces difficulty in entire approximation of concave, linear and discontinuous Pareto fronts. Next, from the results in Fig. 19, we can see the tendency that solutions obtained by PBI are distributed in the central area of Pareto front. Also, we can see that PBI maintains dominated solutions in the regions B of Fig. 19d in the population. However, PBI also cannot obtain solutions in the region A of the discontinuous Pareto front. Finally, from the results of Fig. 20, we can see that IPBI using \(\theta =10\) obtains solutions covering four different shapes of Pareto front. Since IPBI using \(\theta =0\) has the similar effect to WS, solutions obtained by IPBI using \(\theta =0\) are very similar to Fig. 18 and has difficulty in approximation except convex Pareto front. However, as shown in Fig. 20, IPBI using \(\theta =10\) can approximate four shapes of Pareto front by emphasizing the directivity for each search direction. In the case of IPBI, obtained solutions are distributed also outside of the true Pareto front. Although this feature has an advantage to find undiscovered widely spread POS, it also has a disadvantage that the number of solutions to approximate Pareto front is decreased and the approximation granularity of Pareto front becomes low. In the problem with the discontinuous Pareto front, TCH, WS and PBI cannot obtain solutions in the region A. On the other hand, IPBI obtains solutions also in the region A since dominated solutions in the region C of Fig. 20d are maintained in the population and help to find solutions in the region A.

These results reveal that IPBI is able to obtain solutions to approximate four representative shapes of Pareto front. In this work, a fixed \(\theta \) is used for all IPBI scalarizing functions. To approximate complicated shapes of Pareto front, appropriate \(\theta \) for each part of Pareto front would be different. An independent control of \(\theta \) for each search direction will be a future work not only for IPBI but also for PBI.

7 Conclusions

To approximate the entire Pareto front by improving the spread of solutions in the objective space and enhance the search performance of MOEA/D in multi and MaOPs, in this work we have proposed IPBI scalarizing function which is an extension of PBI and WS functions. We analyzed differences between IPBI and conventional scalarizing approaches, and compared the search performances of six algorithms in discrete MOKPs and continuous WFG4 problems with 2–8 objectives. The first five algorithms were MOEA/Ds using TCH, rTCH, WS, PBI and IPBI, respectively. The last one was NSGA-III employing the decomposition approach of the objective space similar to MOEA/D. The experimental results showed that IPBI approach using an appropriate \(\theta ^*\) achieved the highest HV among six algorithms for all MOKPs used in this work. Also, in WFG4 problems, we showed that IPBI using \(\theta ^*\) achieved the highest search performance in problems with the difficulty to obtain a widely spread POS in the objective space. IPBI contributes to improving availability of the MOEA/D framework for solving various kinds of MOPs, and IPBI will be utilized as an option of scalarizing function especially for solving problems with many-objectives and difficulty to obtain a widely spread solutions. Also, we showed the robustness of MOEA/D using IPBI on Pareto front geometry by using problems with four representative concave, linear, convex and discontinuous Pareto fronts.

As future works, we are planning to study alternative origin point to decompose the objective space to improve the approximation granularity of Pareto front while maintaining the spread of solutions in the objective space. Also, we will study an automatic control of the parameter \(\theta \) in IPBI and a method to independently control \(\theta \) for each weight vector \(\varvec{\lambda }\).