1 Introduction

The problem of finding suitable compromises among conflicting objective functions such as performance and cost often arises in real-world engineering design processes. When such conflicting relationships among objective functions are present in a multiobjective design optimization problem, no single solution can simultaneously minimize all objective functions, so the solutions of the optimization problem are obtained as a Pareto-optimal solution.

One of the very commonly used methods to obtain Pareto-optimal solutions is scalarization approaches in which a multiobjective optimization problem is converted to a single objective optimization problem. A weighting method (Zadeh 1963; Geoffrion 1968) is the most popular of such approaches, but the obtained solutions are greatly affected by the values of predetermined weighting coefficients, and the need to subsequently adjust these coefficients is a crucial problem. Goal Programming (Charnes and Cooper 1977), which uses target values of objective functions, and Physical Programming (Messac 1996), in which design engineers determine their preferences for objective functions, can be also classified as scalarization approaches. Such scalarization approaches provide only a single solution from a Pareto-optimal solution set, and the obtained solution is highly dependent on given parameter values.

Obtaining an entire set of Pareto-optimal solutions is another approach used to tackle the above difficulties in multiobjective optimization. This approach does not require predetermined preference parameter settings, and moreover, the obtained Pareto-optimal solution set can aid the design process because it offers designers a clear picture of the trade-off relationships among the conflicting objective functions. Various multiobjective optimization methods have been proposed to obtain Pareto-optimal solution sets. In particular, multiobjective optimization methods based on metaheuristic techniques such as genetic algorithms (Goldberg 1989; Tamaki et al. 1996; Deb et al. 2002), and particle swarm optimization (Coello et al. 2004; Reyes-Sierra and Coello 2006; Kotinis 2011; Zhao and Suganthan 2011; Mahmoodabadi et al. 2012), have been extensively studied and used (Nourbakhsh et al. 2011; Montazeri-Gh et al. 2012; Sharma et al. 2014) in many applications.

Metaheuristic-based multiobjective optimization techniques use an aggregative searching strategy in which many points in the design space are simultaneously evaluated to obtain points that are used in the next iteration. An aggregative strategy is advantageous for global searches and can provide a Pareto-optimal solution set in a single optimization calculation. However, constraints cannot be explicitly handled in such methods, and the algorithms or objective functions must be modified to implement constraint handling (Cai and Wang 2006; Qu and Suganthan 2011; Cagnina et al. 2011). Furthermore, metaheuristic-based methods are inefficient when searching for fine-tuned solutions once a nearly global optimum is found, since the algorithms do not include design sensitivities. Metaheuristic techniques are also not well-suited for handling large-scale problems, that have many design variables, such as topology and shape optimization problems even in single objective optimization problems, although some extended schemes have been proposed (Wang and Tai 2005; Wang et al. 2006).

The use of gradient-based methods in multiobjective optimization problems has been discussed (Fliege and Svaiter 2000; Fliege et al. 2009; Qu et al. 2011) and the convergence properties of multiobjective optimization algorithms for unconstrained problems have been investigated. The normal boundary intersection method (Das and Dennis 1998), the normal constraint method (Messac et al. 2003), and the adaptive weighted-sum method (Kim and De Weck 2005) are methods where additional constraints are included in original optimization problems, to convert a multiobjective optimization problem to several single objective optimization problems and obtain Pareto optimal solutions. These methods can employ gradient-based optimization techniques that can be applied to constrained problems, and they can utilize design sensitivities in the optimization process.

The design sensitivity is the gradient of objective functions, or constraints, with respect to the design variables. Using design sensitivities in the optimization process is advantageous in large-scale constrained problems, since accurate design sensitivities can often be easily obtained in many engineering design optimization problems (Choi and Kim 2004) via semi-analytical computations, by conducting sensitivity analysis prior to the optimization process. This is routinely implemented when dealing with topology and shape optimization problems. In particular, the adjoint method is an effective technique for obtaining equations used to calculate design sensitivities. Once these equations for calculating the sensitivities are obtained through the sensitivity analysis, the computationally costly finite-difference method does not have to be used to carry out gradient calculations. However, these gradient-based methods may obtain local optima in multi-modal problems, which is typical of gradient-based methods because the obtained solutions are highly dependent on the selection of starting points. Therefore, a number of widely distributed starting points are required to obtain global Pareto-optimal solutions.

Therefore, this paper proposes a new gradient-based multiobjective optimization method that employs an aggregative strategy. In this method, the objective functions and constraints are evaluated at multiple points and the design variables at each point are updated using information aggregatively obtained from all other points in the objective function space.

2 Fundamental concept of the aggregative gradient-based multiobjective optimization method

Multiobjective optimization involves the simultaneous optimization of two or more conflicting objectives; a typical nonlinear multiobjective optimization problem can be written as

$$ \text{minimize} \quad \textbf{f}(\textbf{x}) = [f_{1}(\textbf{x}), f_{2}(\textbf{x}), \dots, f_{m}(\textbf{x})]^{\mathrm{T}} $$
(1)

subject to:

$$ \textbf{g}(\textbf{x}) \leq 0 $$
(2)
$$ \textbf{x}_{L} \leq \textbf{x} \leq \textbf{x}_{U} $$
(3)
$$ \textbf{x} = [x_{1}, x_{2}, \dots, x_{n}]^{\mathrm{T}}, $$
(4)

where objective function vector f is a function of design variable vector x, and g is an inequality constraint vector. x L and x U respectively denote the lower and upper bound of the design variables. Note that any equality constraint can be represented using a combination of inequality constraints. Therefore, equality constraints are simply ignored in the above formulation.

Figure 1 shows the conceptual scheme of the proposed method for a bi-objective optimization problem. In this scheme, objective functions and constraints at multiple points are simultaneously evaluated during an iteration, and every point is updated in an appropriate direction using a gradient-based search. For example, the black points denoted A, B, ..., G in Fig. 1 indicate the positions of evaluated points in the objective function space, and the arrows indicate their update directions. To efficiently obtain a Pareto-optimal solution set, each point is moved toward the Pareto frontier point that is closest to its current position in the objective function space. That is, since point A has the smallest value of f 2 among all evaluated points, the gradient-based search result should move this point in the direction indicated by the arrow, to further minimize f 2. On the other hand, f 1 should be minimized for point D. Furthermore, for points B and C, both f 1 and f 2 should be minimized, to obtain Pareto solutions that represent compromises. Similarly, points E, F, and G should each move toward the corresponding closest Pareto frontier point, in the directions indicated by the white arrows in Fig. 1.

Fig. 1
figure 1

Aggregative multiobjective optimization scheme of the proposed method

However, the true Pareto frontier is unknown prior to optimization calculations. Therefore, in this paper, the design variables are updated using a weighting method, and the search direction for each point is modified at every iteration by adaptively adjusting the weighting coefficients, based on each point’s position relative to the other points in the objective function space.

3 Adaptive weighting coefficients

In the proposed method, weighing coefficients are determined using a Data Envelopment Analysis (DEA) technique. DEA is a tool originally used in the field of economics to evaluate the relative performance of decision-making units (DMUs) in multi-input and multi-output environments (Charnes et al. 1978). In this implementation, the performance values of a DMU for multiple criteria are converted, by solving a linear programming problem, to a single value, termed the efficiency value, which is then used to evaluate relative performance among multiple DMUs. An important feature of the DEA technique is that it can provide optimal weighting coefficients when the efficiency is calculated. In the proposed method, DEA is conducted for each point to obtain appropriate weighting coefficients.

In the case that all objective functions are to be minimized, the efficiency of the M-th point, 𝜃 M, is calculated by solving the following linear programming problem.

$$ \text{minimize} \quad \theta^{M} = \sum\limits_{i = 1}^{m} {{w_{i}^{M}} {f_{i}^{M}} } \quad \text{w.r.t.} \quad {w_{i}^{M}} $$
(5)

subject to:

$$ \sum\limits_{i = 1}^{m} {{w_{i}^{M}} } {f_{i}^{k}} \ge 1 \quad (\text{for} \quad k=1,2,...,K) $$
(6)
$$ {w_{i}^{M}} \ge 0 \quad (\text{for} \quad i=1,2,...,m), $$
(7)

where K is the total number of points, \({f_{i}^{k}}\) is the k-th point’s i-th objective function value, and \({w_{i}^{M}}\) represents the weighting coefficients. If the M-th point is a non-dominated point among K points, 𝜃 M becomes 1, and for dominated points, 𝜃 M becomes larger than 1.

The weighting coefficients calculated by the DEA act to minimize 𝜃 M, which is converted to a single objective function by the weighting method. Therefore, if the M-th point has a smaller value of \({f_{1}^{M}}\) and a larger value of \({f_{2}^{M}}\) than the other points in a bi-objective problem, the above linear programming problem returns a larger \({w_{1}^{M}}\) value and a smaller \({w_{2}^{M}}\) value, which increases the importance of the first objective function. The use of such calculated weighting coefficients when the design variables are subsequently updated is the main idea of the proposed method. The detailed procedure of the proposed multiobjective optimization method is explained in the following section.

4 Procedures

The aggregative gradient-based multiobjective optimization procedure is now described in detail. Figure 2 shows the flowchart of the procedure.

  • Step 1: Initialization Generate initial design variables with random values for K points.

  • Step 2: Evaluate objective functions Evaluate objective functions for all K points.

  • Step 3: Calculate weighting coefficients Solve the linear programming problem shown in (5) through (7) for all points and obtain adaptive weighting coefficients \({w_{i}^{M}}\). M is then set to 1.

  • Step 4: Calculate design sensitivities Evaluate sensitivities of the objective functions and constraint functions for the M-th point.

  • Step 5: Update design variables Update the design variables of the M-th point, minimizing the weighted sum of the objective functions using weighting coefficients \({w_{i}^{M}}\). A Sequential Linear Programming (SLP)-based updating scheme is used in Step 5 since it can stably handle a large number of design variables using well-established linear programing solvers. That is, the single objective optimization problem, converted from the multiobjective optimization problem through the use of adaptive weighting coefficients, is linearly approximated. In this step, the following approximated linear programming problem is solved to update the design variables of the point.

    $$ \mathrm{minimize} \quad f^{M} = \sum\limits_{i = 1}^{m} {{w_{i}^{M}} \sum\limits_{j = 1}^{n} {\frac{\partial f_{i} \left(\mathbf{x^{M}}\right)}{\partial x_{j}}} } x_{j} \;\; \mathrm{w.r.t.} \;\; x_{j} $$
    (8)

    subject to:

    $$\begin{array}{@{}rcl@{}} g_{s} \left(\mathbf{x^{M}}\right) + \sum\limits_{j = 1}^{n} {\frac{{\partial g_{s} \left(\mathbf{x^{M}}\right)}}{{\partial x_{j} }}} \left(x_{j} - {x_{j}^{M}}\right) \le 0 \\ \quad\quad (\text{for} \quad s=1,2,...,t) \end{array} $$
    (9)
    $$ \tilde{\mathbf{x}}_{L} \leq \mathbf{x} \leq \tilde{\mathbf{x}}_{U}, $$
    (10)

    where f M is the weighted sum of the objective functions obtained in Step 3, and x M is the design variable vector of the M-th point before updating. \(\tilde {\mathbf {x}}_{L}\) and \(\tilde {\mathbf {x}}_{U}\) are respectively the lower and upper bound for this linear programming problem considering the moving limit of the design variables. If M=K, the procedure then advances to Step 6 (Check termination condition), otherwise M=M+1 and the procedure returns to Step 4.

  • Step 6: Check termination condition If the termination condition is satisfied, the procedure ends, otherwise it returns to Step 2.

    Fig. 2
    figure 2

    Flowchart

Note that in the above procedure, two linear programming problems are solved for each point during a single iteration: 1) to determine weighting coefficient values, and 2) to update the design variables. The weighting coefficients for each point are therefore adaptively updated at every iteration. We note that although our method utilizes the concept of a weighting method, it does not require setting the weighting coefficients to predetermined values. The proposed method employs multiple starting points and some points are generated near Pareto frontier while some other points generated near local optima. Furthermore, appropriate weighting coefficient values are given to points which are close to the Pareto frontier so that global optima can be reached. This method, therefore, has a high likelihood to avoid local optima and obtain the true Pareto solutions.

5 Numerical examples

The proposed method is now applied to five numerical examples to demonstrate its performance. In the following examples in this section, values 5 % larger or smaller than those of the design variables were used as SLP move limits, if not mentioned. Computation time is used as a termination condition. That is to say, when a preassigned computation time is reached, the optimization process is halted. Equations to calculate design sensitivities for these examples can be analytically obtained before starting optimization processes. Sensitivities at each point were calculated using such equations, in the same manner employed in many conventional structural optimization methods.

In some of the the following examples, results of the proposed method are compared with those of a multiobjective genetic algorithm (MOGA). In the following calculation, a genetic algorithm code implemented based on the principles of Non-dominated Sorting Genetic Algorithm-II (NSGA-II) (Deb et al. 2002) was used for the MOGA calculation. In this calculation, each design variable is encoded using a real-coding scheme, and simulated binary crossover (SBX) and polynomial mutation were employed. The crossover and mutation rates were set to 1.0 and 1/l, respectively, where l is the number of design variables. The distribution indexes for crossover and mutation were set to 10 and 20, respectively. MOGA was conducted for the same computation time as that used for the proposed method. The various solutions are compared in figures, and the quantitative metrics for the hypervolume (Zitzler and Thiele 1999) and diversity (Deb et al. 2002) are also given.

Note that the use of computation time for the comparison of the optimization efficiency has an implicit limitation, since computation time significantly depends on how the particular method is implemented, as well as the complexity of the calculations carried out. In prior research, the number of function calls has been frequently been used as a basis for comparison. However, since the proposed method evaluates not only objective function values but also design sensitivities, the number of function calls cannot be used for the comparison. This is why we use the computation time, and we note that the following examples only provide simple assessment results from the standpoint of this single aspect.

5.1 Example 1

The first example is a two-objective optimization problem (Jin et al. 2001), formulated as below.

$$ f_{1}=\frac{1}{n}\sum\limits_{i=1}^{n}{x_{i}^{2}} $$
(11)
$$ f_{2}=\frac{1}{n}\sum\limits_{i=1}^{n}(x_{i} -2)^{2} $$
(12)
$$ 0 \le x_{i} \le 1 \quad (\text{for} \quad i=1,2,...,n), $$
(13)

where n is the number of design variables, set here to 10. K denotes the total number of points and was set to 40 in this numerical example. The computation time was set to 60 second.

In Fig. 3, the eight-point stars indicate the initial point distribution in the objective function space. The design variables of the initial points were generated using a random uniform distribution function in the range of the upper and lower side constraint, [0, 1]. The black line segments attached to the initial point indicators represent weighting vectors at the first iteration, calculated by solving the linear programming problem in (5) through (7).

Fig. 3
figure 3

Distribution of initial points and their weighting vectors in Example 1

In Fig. 4, the “+” symbols indicate the solution points obtained after 66 iterations, and are connected by black lines to the corresponding initial points. This figure illustrates that the proposed method obtains well-distributed Pareto-optimal solutions, although our method does not guarantee this. Note that, in order to conduct optimization for 66 iterations, 66 × 40 = 2,640 function calls, and the same number of design sensitivity calculations, are required.

Fig. 4
figure 4

Obtained solutions in Example 1

Figure 5 shows a comparison of the non-dominated solutions obtained for this problem when using two different methods: the proposed method, and MOGA, with the same termination criterion of computational time used in both cases. In the MOGA calculation, the population size was set to 40 and computation was conducted for 465 generations. This figure shows all non-dominated solutions obtained during the optimization process, whereas Fig. 4 only shows the position of all points at the very last iteration. Table 1 provides the hypervolume and diversity metric values for both methods. Figure 5 and Table 1 show that the solution distribution obtained by the proposed method is almost equivalent to that of the MOGA. However, the proposed methodfs solutions are slightly better than those of the MOGA, since the proposed method can conduct local searches near the Pareto frontier.

Fig. 5
figure 5

Comparison with MOGA

Table 1 Comparison of two optimization methods in Example 1

Figure 6 shows the position of points at the final iteration of the optimization process when different numbers of K but the same termination criterion are used. This figure illustrates that the proposed method works well for different numbers of starting points, but a larger number of points is preferable for obtaining closely distributed solutions.

Fig. 6
figure 6

Effect of the number of starting points

5.2 Example 2

Next, the proposed method was applied to a test function (Preuss et al. 2006). The problem is stated as follows.

$$ f_{1}= {x_{1}^{4}} + {x_{2}^{4}} - {x_{1}^{2}} + {x_{2}^{2}} - 10 x_{1} x_{2} + 0.25x_{1} + 20 $$
(14)
$$ f_{2}= (x_{1} - 1)^{2} + {x_{2}^{2}} $$
(15)
$$ -2 \le x_{1},x_{2} \le 2 $$
(16)

In this calculation, the computation time was set to 15 seconds, and the optimization process was conducted until the calculation reached the 26th iteration. Therefore, 40 × 26 = 1,040 function calls and the same number of design sensitivity calculations were conducted.

Figure 7 shows the results for this problem, where K was set to 40. The eight-point stars again indicate the initial points and the “+” symbols represent the obtained solutions. The circled “+” marks indicate dominated solutions that represent local optima for this problem. This result illustrates that the proposed method provided true Pareto-optimal solutions, although several points are stacked at local optima. The proposed method succeeded in avoiding local optima in this example because it utilizes multiple starting points, and appropriate weighting coefficient values are given to points close to the Pareto frontier, so that global optima can be reached.

Fig. 7
figure 7

Obtained solutions in Example 2

Figure 8 and Table 2 show a comparison of the non-dominated solutions obtained using the proposed method and MOGA, in which a population size of 40 was used and computation was conducted for 182 generations. MOGA is suitable for this kind of problem since it has only two variables and includes a large discontinuity in the Pareto frontier line of the objective function space. However, the proposed method also provided good solutions.

Fig. 8
figure 8

Comparison with MOGA in Example 2

Table 2 Comparison of two optimization methods in Example 2

Figure 9 shows the effect of selecting the number of starting points, and this figure also illustrates that good solutions are successfully obtained regardless of the number of starting points.

Fig. 9
figure 9

Effect of the number of starting points in Example 2

5.3 Example 3

Next, the proposed method is applied to a three-objective optimization problem (Laumanns et al. 2002). The problem formulation is as follows.

$$ f_{1} = 3 - (1+x_{3})\cos(x_{1} \uppi/2)\cos(x_{2} \uppi/2) $$
(17)
$$ f_{2} = 3 - (1+x_{3})\cos(x_{1} \uppi/2)\sin(x_{2} \uppi/2) $$
(18)
$$ f_{3} = 3 - (1+x_{3})\cos(x_{1} \uppi/2)\sin(x_{1} \uppi/2) $$
(19)
$$ 0 \le x_{1},x_{2},x_{3} \le 1 $$
(20)

In this problem, the number of starting points K was set to 60, the moving limit for the SLP search was set to 0.1, and the computation time was set to 15 second.

Figure 10 shows all the non-dominated solutions obtained during the optimization process for 28 iterations. This figure illustrates that the proposed method can effectively provide well-distributed solutions.

Fig. 10
figure 10

Non-dominated solutions in Example 3

5.4 Example 4

For the fourth example, the 105-bar truss design optimization problem shown in Fig. 11 was solved. In this problem, K was set to 20, and the computation time was set to 1,500 seconds. In this example, a 1.0 × 105 N force in the −Y direction was applied at node A, at the lower right-hand edge of the structure. The design variables were the cross-sectional area of each truss element, with an upper and lower constraint of [ 1.0 × 10−3, 1.0 × 10−6 ]. The objective functions in this problem were the displacement at node A, denoted d A , and the total volume of the structure, V, and both were to be minimized under compressive and tensile stress constraints. The Young’s modulus for each truss element and the allowable stress, σ a , was set to 210 GPa and 270 MPa, respectively. This problem was formulated as follows.

$$ f_{1}=V $$
(21)
$$ f_{2}=d_{A} $$
(22)

subject to:

$$ 1.0 \times 10^{-6} \le x_{j} \le 1.0 \times 10^{-3} $$
(23)
$$\begin{array}{@{}rcl@{}} -\sigma_{a} \le \sigma_{j} &\le& \sigma_{a} \\ \qquad (j&=&1,2, \dots, 105) \,, \end{array} $$
(24)

where σ j is the stress at the j-th truss member.

Fig. 11
figure 11

Configuration of 105-bar truss problem in Example 4

Figure 12 shows a comparison of the non-dominated solutions obtained for this problem when using two different methods, i.e., the proposed method and the MOGA. In this MOGA calculation, the population size was set to 100. This figure illustrates that the proposed method obtained better solutions than those of the MOGA with NSGA-II method, and that the non-dominated solutions were widely distributed. Figure 13 shows the low f 2 range of the non-dominated solutions for this problem, to clearly illustrate the distribution of the solutions obtained by the MOGA method.

Fig. 12
figure 12

Non-dominated solutions obtained in Example 4

Fig. 13
figure 13

Non-dominated solutions in Example 4 (magnified)

As shown in these figures, the MOGA method did not provide a wide range of solutions, and the objective function values were slightly poorer compared with those of the proposed method, especially in the range where the total volume f 1 was low, although many useful solutions were obtained in the range where f 1 was large. On the other hand, the proposed method obtained better solutions in the lower total volume range because stress constraints are explicitly handled, which enhances the searching efficiency in this range.

Table 3 presents comparative data for the proposed and MOGA methods. The number of iterations used in the proposed method was significantly smaller than that in the MOGA method, in which the term ”generation” is used rather than iteration. The proposed method uses a smaller number of iterations than the MOGA because two linear programming problems must be solved for each point, during every iteration, and sensitivity calculations are more computationally costly than plain function calls.

Table 3 Comparison of two optimization methods in Example 4

5.5 Example 5

In this last example, the proposed method is used to solve the topology optimization problem illustrated in Fig. 14, which is called Messerschmitt-Bölkow-Blohm (MBB) beam problem. The objective functions are the mean compliance and total volume, and both are minimized. The density method is used in this problem and the design domain is discretized into 60 × 40 elements. This problem is therefore a large-scale problem with 2,400 design variables, and the range of each design variable was set to [0,1]. We note that a metaheuristics-based multiobjective optimization method could not be applied to such a large-scale problem.

Fig. 14
figure 14

Topology optimization problem

The multiobjective topology optimization is formulated as follows.

$$ f_{1} = \mathbf{u}^{\mathrm{T}} \mathbf{K}\mathbf{u} $$
(25)
$$ f_{2} = V $$
(26)

subject to:

$$ \mathbf{0} \le \mathbf{x} \le \mathbf{1} $$
(27)
$$ \mathbf{K}+{u}=\mathbf{f}, $$
(28)

where u and f are respectively the displacement and force vector, K is the stiffness matrix, and V denotes the total volume.

The mean compliance problem is self-adjoint, and the design sensitivity can be calculated using the following equation.

$$ \frac{\partial f_{1}}{\partial x_{i}}= -\mathbf{u}^{\mathrm{T}} \frac{\partial \bf K}{\partial x_{i}}\mathbf{u} $$
(29)

where x i is the design variable for i-th element. The sensitivity of the total volume with respect to every design variable is 1 since the all elements are square and of equal size in this problem.

In this calculation, the computation time was set to 2000 second, and the optimization process executed 169 iterations; therefore, 60 × 169 = 10,140 function calls and the same number of calculation of design sensitivities were conducted. Figure 15 shows the non-dominated solutions obtained by the proposed method when K was set to 60. Four selected points, A–D in this figure, have corresponding configurations shown in Fig. 16. The illustrated configurations demonstrate that the proposed method can effectively obtain non-dominated solutions for topology optimization problems that include a large number of design variables, since design sensitivities are used when updating the design variables. We note that conventional multiobjective optimization methods employing metaheuristic techniques, classified as direct methods since design sensitivities are not used in the optimization process, are almost always unsuitable for such large-scale problems.

Fig. 15
figure 15

Non-dominated solutions in Example 5

Fig. 16
figure 16

Examples of obtained solutions in Example 5

6 Conclusions

This paper proposed a new gradient-based multiobjective optimization method to obtain Pareto-optimal solutions. The proposed method uses a weighting method to convert a multiobjective optimization problem to a single objective optimization problem, and a linear programming problem is solved to determine adaptive weighting coefficients for each point while considering the point’s position relative to all other points in the objective function space. The converted single objective optimization problem is linearly approximated, and the design variables are updated by a linear programming technique. The proposed method was applied to five examples, three test functions and two structural optimization problems, and the results obtained in all examples demonstrated its effectiveness. The proposed method is based on the weighting method, which makes it unsuitable for non-convex problems. In future work, we hope to extend the proposed method for application to non-convex problems.