1 Introduction

As an important area of optimization, multiobjective optimization is dealing with problems in which some objective functions are required to be simultaneously optimized. It has attracted attention in many applied sciences in the presence of trade-off analysis, such as: medicine, economics, engineering, transportation, and logistics optimization. In these fields, the multiobjective optimization is the best strategy to make a desired decision. For example, the problem of buying a car can be considered as a multiobjective optimization problem (MOP) where minimizing cost and maximizing comfort are two conflicting objectives. In practical optimization problems, the number of objective functions is larger than one objective. Due to the inherent conflict among objective functions, there exists no solution that could simultaneously optimize all objectives. There are a large number of solutions for these problems called efficient (Pareto optimal) solutions. An efficient or Pareto optimal solution is a solution in which the improvement of any objective function leads to a decline in at least one objective function. The value of the objective functions at an efficient solution is called a nondominated point. The set of all nondominated points and the set of all efficient solutions are called the nondominated and efficient sets, respectively. As shown in proposition 2.4 in Ehrgott (2005), each nondominated point belongs to the boundary of the feasible set in the objective space. Thus, the set of all nondominated points is usually called nondominated frontier. In contrast to single objective optimization, there are many concepts of solution in MOP. Researchers can find a representation of efficient solutions (nondominated points) or employ a trade-off analysis to find the most preferred solution in the decision maker point of view (Sayadi-Bander et al. 2015; Sayadi-bander et al. 2017a, b; Basirzadeh et al. 2014; Eskelinen and Miettinen 2012; Sakawa and Yano 1990; Kuk et al. 1997).

One of the interesting fields in multiobjective optimization is the approximation of the entire nondominated frontier. Computing an approximation of the whole nondominated frontier enables decision makers to have a deep insight into the problem structure and information about trade-off. A trade-off denotes the amount of giving up on one of the objective functions, which leads to the improvement of other objective functions. In multiobjective optimization problems, having a good approximation of the entire nondominated frontier enables us to employ some secondary preferences over the nondominated frontier so as to choose the most preferred solution. Motivated by this important application, nowadays there is an increasing interest in the methods of approximating the whole nondominated frontier. This fact is demonstrated by many published papers related to this topic (Benson and Sayin 1997; Das and Dennis 1998; Fliege 2004, 2006; Basseur 2006; Fliege and Heseler 2002; Segura et al. 2013; Da Silva 2010; Schandl et al. 2001).

One of the most important classes of algorithms for solving an MOP is the class of scalarization methods. Some parameters are usually needed to be determined for scalarizing an MOP, but obtaining a suitable scalarization function with parameters needs a great insight into the problem structure, which is a very difficult task and in some cases almost impossible. The weaknesses of the scalarization method are illustrated explicitly in detail in Sect. 7 of the work of Fliege et al. (2009). Hence, developing some methods which do not need this prior information can be of special interest. In this regard, recently a new class of multiobjective optimization algorithms has been developed which does not depend on any parameter (Fliege and Svaiter 2000; Drummond and Iusem 2004; Drummond and Svaiter 2005; Fliege et al. 2009; Qu et al. 2011; Povalej 2014; Morovati et al. 2016; Qu et al. 2013; Villacorta et al. 2014). Indeed, these methods are generalizations of optimization algorithms for single objective problems.

Fliege et al. (2009) have proposed an a priori parameter-free Newton-type optimization method for obtaining a point that satisfies the first order necessary condition for multiobjective optimization. Similar to the classical Newton method in each iteration, they considered some quadratic approximations related to the original objective functions. They used a quadratic subproblem to find a search direction.

Newton’s method is a well-known method for solving single objective optimization problems, because it uses the Hessian which offers useful curvature information. However, for various practical problems, the computing efforts of the Hessian matrices are very expensive or the Hessian is not available analytically.

These difficulties lead to some methods that only use the function values and the gradients of the objective functions in ways closely related to Newton’s method. Some of these methods are quasi-Newton methods which do not need to compute the Hessian, but generate a series of Hessian approximations and at the same time maintain a fast rate of convergence (Sun and Yuan 2006). Recently, the BFGS quasi-Newton multiobjective optimization (QNMO) method has been proposed by some authors Qu et al. (2011), Povalej (2014).

As mentioned, the approximation of the Hessian matrix is used in the quasi-Newton methods instead of the Hessian matrix. There are many methods for approximating this matrix. In the QNMO method proposed by Povalej (2014), the BFGS method is used to approximate the Hessian matrix. It is well-known that the BFGS method belongs to the Broyden class (Nocedal and Wright 2006). In the quasi-Newton methods proposed in this paper for the MOP, besides the BFGS method which is the most well-known and important among the methods of the Broyden class, the self-scaling BFGS (SS-BFGS) method from the self-scaling Broyden class and the Huang BFGS (H-BFGS) method from the Huang class are used. We tested the methods on several well-known test problems in two points of view. The first point of view contains CPU times and the computational effort and the second one, which deals with the quality of the nondominated frontiers, contains such criteria as the number of nondominated points, the value of the purity metric, spread metric, and the epsilon indicator. In order to have a good comparison, an approximated nondominated frontier for each test problem, namely a reference frontier, was determined by using the numerical results of all proposed methods. Then, the obtained approximated nondominated frontier of any algorithm was compared with the reference frontier. All of the numerical results are explicitly represented in some appropriate figures. Each of these figures contains some curves called performance profiles which correspond to some methods. These profiles can be used to compare the computational performances of the methods (Dolan and More 2002). Also, the number of iterations and function evaluations corresponding to each of the methods is reported.

Unlike the methods of Broyden and self-scaling Broyden classes, the methods of the Huang class not only use the gradient information, but also employ function values to approximate the Hessian matrix. For this reason, the methods of the Huang class provide a higher-order accuracy in approximating the second-order curvature of the problem functions than the methods of the Broyden and self-scaling Broyden classes (See Theorem 2). The numerical results show that the H-BFGS and the SS-BFGS methods have a better performance than the BFGS method in different points of view.

As given in Fliege et al. (2009) and Povalej (2014), in the QNMO methods a scalar quadratic subproblem is required to find the search direction in each iteration. In this paper, first, a quadratic multiobjective subproblem is introduced. Then, comparing the solution sets of this subproblem and the scalar subproblem, some theoretical results are derived. Particularly, using these theoretical results and the known results from the theory of multiobjective optimization, a simple representation of the Karush–Kuhn–Tucker (KKT) conditions is presented.

In the BFGS QNMO method developed by Qu et al. (2011) and Povalej (2014), similar to the single objective case, an Armijo-like rule is used for line search. In this paper, we use the Armijo-like rule to find a step length in a similar manner. When employing the Armijo-like rule in multiobjective problems, more than one inequality should be simultaneously satisfied. Thus, this rule imposes a large computational effort and yields a small step length. Consequently, using the Armijo-like rule for finding a step length decreases the speed of convergence. Based on this fact, in this paper all methods are considered in two cases: in the presence of the Armijo line search and in the absence of any line search. Similar to Fliege et al. (2009), Povalej proved the convergence of the QNMO methods in the presence of the Armijo line search (Povalej 2014). In this paper, we extend this proof to the case where no line search was used.

This paper is organized as follows. Section 2 contains some notations, definitions, and preliminaries that are used in the remainder of the paper. Section 3 is devoted to the quasi-Newton optimization methods. First, a brief description of the quasi-Newton directions for single objective optimization problems is presented. Then, the quasi-Newton subproblem for the search direction in the MOP is investigated and some propositions are provided. Section 4 proceeds with the convergence analysis. Section 5 deals with the performance assessment. In this section, we review some well-known standard criteria for comparing the quality of the obtained nondominated frontiers. In Sect. 6, the three mentioned quasi-Newton optimization methods for the MOP are implemented for several well-known test problems. Moreover, in this section the obtained numerical results are explicitly evaluated by some standard criteria. Finally, the paper concludes with some discussions on comparing the QNMO methods.

2 Preliminaries

In this section, some notations that are used in the remainder of the paper are introduced. Throughout the paper, \( \mathbb {R} \), \( {\mathbb { R} _ \ge } \), and \( {\mathbb { R} _ >} \) stand for the sets of real numbers, non-negative real numbers, and strictly positive real numbers, respectively. \(V \subset \mathbb {R}^n \) denotes an open set and \( F:V \rightarrow \mathbb {R}^m \) is an arbitrary vector function. For this vector function, \( \mathbb {R}^n \) and \( \mathbb {R}^m \) are called the decision and objective spaces, respectively. We assume that \( F \in C^2(V,\mathbb R^m)\); i.e. \(F=(F_1,\dots ,F_m)\) is twice continuously differentiable on V. For \(x\in V\), we will use symbols \(JF(x)\in \mathbb R^{m\times n}\), \(\nabla F_j(x)\in \mathbb R^m\), \(HF_j(x)\in \mathbb R^{n\times n}\), and \( B^j(x)\in \mathbb R^{n\times n} \) to denote the Jacobian matrix of F, the gradient of \(F_j\), the Hessian matrix of \(F_j\), and the approximation of the Hessian matrix of \(F_j\) at x, respectively. We denote the range or image space of matrix A by S(A). Also, the notation \(B\preccurlyeq C\) (\(B\prec C\)) means that \(C-B\) is a positive semidefinite (definite) matrix.

In this paper, the following unconstrained MOP is considered:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{x \in V} \,\,F(x). \end{array} \end{aligned}$$
(1)

V and \( Y=F(V) \) are called the feasible set in the decision and objective spaces of Problem (1), respectively. In the following, related to Problem (1), some preliminary definitions, a lemma, and a theorem are presented.

Definition 1

(Ehrgott 2005) A feasible solution \( x^*\in V \) is called an (a weakly) efficient solution of Problem (1), if there is no other \( {x} \in V \) such that \( F(x) \le F({x^*}) \) and \( F(x) \ne F({x^*}) \) (\( F(x) < F({x^*}) \)). If \( x^*\) is an (a weakly) efficient solution, \( F(x^*) \) is called a (weakly) nondominated point.

The set of all nondominated points and efficient solutions are called the nondominated and efficient sets, respectively. As shown in proposition 2.4 in Ehrgott (2005), each nondominated point belongs to the boundary of the feasible set in the objective space (Y). Thus, the set of all nondominated points is usually called nondominated frontier. We denote the nondominated frontier or nondominated set by \( Y_N \).

Numerical algorithms usually find a local solution. Definition 2 deals with the locally (weakly) efficient solutions.

Definition 2

(Fliege et al. 2009) A feasible solution \( x^*\in V \) is called a locally (weakly) efficient solution, if there is a neighbourhood \( W \subseteq V \) of \( x^*\) such that \( x^*\) is an (weakly) efficient solution on W. If \( x^*\) is a locally (weak) efficient solution, \( F(x^*) \) is called a locally (weakly) nondominated point.

Definition 3

(Ehrgott 2005) Let \( V\subset \mathbb {R}^n \) be a convex set. \( F:V \rightarrow \mathbb {R}^m \) is called \({\mathbb { R}}^m\)-convex if F is componentwise-convex.

In convex problems, globally optimal treatment can be derived from a local treatment as given in Lemma 1.

Lemma 1

(Fliege et al. 2009) Let \(V \subset \mathbb {R}^n \) be a convex set and \( F:V \rightarrow {\mathbb { R}}^m \) be an \({\mathbb { R}^m }\)-convex function. Then, any locally efficient solution is an efficient solution.

In the presence of some assumptions and \( \mathbb R_ \ge ^p \)-compactness, the objective functions satisfy some nice properties which are useful for the theoretical and computational aspects. In the following \( \mathbb R_ \ge ^p \)-compactness is defined.

Definition 4

(Ehrgott 2005) A set \( Y \subset {\mathbb R^p} \) is called \( \mathbb R_ \ge ^p \)-compact if for all \( y \in Y \) the section \( (y - \mathbb R_ \ge ^p) \cap Y \) is compact.

Definition 5

(Ehrgott 2005) The nondominated set \( Y_N \) is said to be externally stable if for each \( y \in Y\backslash {Y_N} \) there is \( \hat{y} \in {Y_N} \) such that \( y \in \hat{y} + \mathbb {R}_ \ge ^p \).

A necessary condition for a given point \(x\in \mathbb R^n\) to be a locally weakly efficient solution for function \( F:V \rightarrow {\mathbb { R}}^m \) is \(S(JF(x))\cap (-\mathbb R^m_>)=\emptyset \) (Gopfert and Nehse 1990; Luc 1988). Motivated by this fact, a critical point is defined as follows.

Definition 6

(Fliege and Svaiter 2000) Point \(x\in \mathbb R^n\) is said to be a locally critical point of vector function \( F:V \rightarrow {\mathbb { R}}^m \) if \(S(JF(x))\cap (-\mathbb R^m_>)=\emptyset \).

From Definition 6, it is clear that if x is not critical, then there exists \( d \in \mathbb R^n \) such that \(\nabla F_j(x) d<0\) for all \(j \in \{1,\dots ,m \}\).

In Theorem 1, the relation between a critical point and a (weakly) efficient solution is explained (Fliege et al. 2009).

Theorem 1

(Fliege et al. 2009) Let \( F \in {C^1}(V,{\mathbb { R}^m}) \),

  1. 1.

    If \( {\bar{x}} \) is a locally weakly efficient solution, then \( {\bar{x}} \) is a critical point for F.

  2. 2.

    If V is convex, F is \( {\mathbb { R}^m} \)-convex and \( \bar{x} \in V \) is critical for F, then \( {\bar{x}} \) is a weakly efficient solution.

  3. 3.

    If V is convex, \( \bar{x} \in V \) is a critical point of F, \( F \in {C^2}(V,{\mathbb { R}^m}) \) and \( {HF_j}(x) \succ 0 \) for all \( x \in V \) and \( j \in \{ 1,2,\ldots ,m\}\), then \( \bar{x} \) is an efficient solution.

We refer interested readers to Fliege et al. (2009) for the proof of Theorem 1.

In Sect. 3, first, a brief exposition of the quasi-Newton direction for single objective optimization is presented and then it will be extended for multiobjective optimization (Qu et al. 2011; Povalej 2014). Finally, we will give some algorithms of quasi-Newton methods for multiobjective optimization, both with line search (Qu et al. 2011; Povalej 2014) and without any line search.

3 Quasi-Newton optimization methods

Newton’s method is a well-known method that uses the Hessian matrix and considers curvature for solving single objective nonlinear optimization problems. However, the analytical computing of the Hessian matrix is a difficult or impossible task in many problems. In order to overcome this difficulty, the quasi-Newton methods are introduced, which use the gradients of objective functions and function values. In these methods, approximations of the Hessian matrices are considered and a fast rate of convergence is provided at the same time. The quasi-Newton methods generate a sequence of approximated Hessian matrices \( \{B_k\} \) and a sequence of related directions \( \{d_k\} \) where \( d_k=-B^{-1}_k \nabla f(x_k) \). We hope that \( \{B_k\} \) possesses positive definiteness and the algorithm behaves like Newton’s method.

In this section, first, a brief description of the quasi-Newton direction for single objective optimization problems is presented and some of the well-known update formulas are introduced. Then, in the subsequent subsection, these update formulas and the quasi-Newton direction are extended to the multiobjective case.

3.1 Quasi-Newton methods for single objective optimization

Let \( f:{V \subset \mathbb {R}^n} \rightarrow \mathbb {R} \) be a twice continuously differentiable function on an open set \( V \subset {\mathbb {R}^n} \). Suppose, we want to solve the problem \( \mathop {\min }\nolimits _{x \in V} \,\,f(x) \) by a quasi-Newton method. The quasi-Newton search direction at a point \( x_k \) (\( d_k:=d(x_k) \)) can be obtained by solving the following quadratic subproblem:

$$\begin{aligned} \mathop {\min }\limits _{d_k \in {\mathbb {R}^n}} \,\,\nabla {f}(x_k)^Td_k + \frac{1}{2}{d^T_k}B_k d_k, \end{aligned}$$
(2)

where \( B_k:=B(x_k) \) is an approximation of the Hessian matrix of f at \( x_k \). An easy computation shows that the unique solution of Subproblem (2) is \( d_k= B_k^{ - 1}\nabla f({x_k}) \). Thus, the new iteration derives from \( x_{k+1}=x_k+ t_k d_k \) with the step length \( t_k \in \mathbb {R}_\ge \).

Two important classes for the approximation of the Hessian matrices are the Broyden class and the self-scaling Broyden class. These two classes just use gradient information. All approximate matrices obtained from these classes are symmetric and satisfy the following equation, namely, the secant equation or the quasi-Newton equation:

$$\begin{aligned} {B_{k + 1}}{s_k} = {y_k}, \end{aligned}$$
(3)

where \( {s_k} := {x_{k + 1}} - {x_k}=t_k d_k \), \( {y_k} := \nabla f({x_{k + 1}}) - \nabla f({x_k}) \), \( B_{k+1}:=B(x_{k+1}) \), and \( \{x_k\} \) is supposed to be the sequence generated by a quasi-Newton method. Indeed, the secant equation is derived from the gradient information of the objective function at points \( x_k \) and \( x_{k+1} \). In what follows, the updating formulas related to the BFGS method of the Broyden class and the self-scaling BFGS (SS-BFGS) method of the self-scaling Broyden class are explained with their required conditions for preserving the positive definiteness property (Nocedal and Wright 2006; Sun and Yuan 2006). It should be noted that the initial matrix \( B_0 \) for all updating formulas presented in this section is assumed to be the identity matrix.

The updating formula for BFGS is (Nocedal and Wright 2006)

$$\begin{aligned} {B_{k + 1}} = {B_k} - \frac{{{B_k}{s_k}s_k^T{B_k}}}{{s_k^T{B_k}{s_k}}} + \frac{{{y_k}y_k^T}}{{s_k^T{y_k}}}. \end{aligned}$$
(4)

The updating formula for SS-BFGS is (Sun and Yuan 2006)

$$\begin{aligned} {B_{k + 1}} = \frac{{s_k^T{y_k}}}{{s_k^T{B_k}{s_k}}}\left( {{B_k} - \frac{{{B_k}{s_k}s_k^T{B_k}}}{{s_k^T{B_k}{s_k}}}} \right) + \frac{{{y_k}y_k^T}}{{s_k^T{y_k}}}. \end{aligned}$$
(5)

As mentioned before, the matrix \( B_{k+1} \) obtained from the above updating formula satisfies the secant equation. Thus, for preserving the positive definiteness property of \( B_{k+1} \), the following curvature condition is a sufficient condition:

$$\begin{aligned} s_k^T{y_k} > 0. \end{aligned}$$
(6)

Unlike the Broyden and the self-scaling Broyden classes that employ only the gradients and ignore the function value information, the Huang class uses both gradient information and function evaluations for approximating the Hessian matrix. All approximate matrices obtained from the Huang class should satisfy the following equation, namely, the Huang quasi-Newton equation instead of the secant equation (3) (Zhang and Xu 2001; Sun and Yuan 2006):

$$\begin{aligned} {B_{k + 1}}{s_k} = {{\hat{y}}_k},\,\,\,\,\,\,{{\hat{y}}_k} = {y_k} + \frac{{{\theta _k}}}{{s_k^Ty_k}}y_k, \end{aligned}$$
(7)

where \( {\theta _k} := 6({f_k} - {f_{k + 1}}) + 3{(\nabla {f_k} + \nabla {f_{k + 1}})^T}{s_k} \). In this case, the Huang BFGS (H-BFGS) updating formula is as follows:

$$\begin{aligned} {B_{k + 1}} = {B_k} - \frac{{{B_k}{s_k}s_k^T{B_k}}}{{s_k^T{B_k}{s_k}}} + \frac{{{{\hat{y}}_k}{{\hat{y}}_k}^T}}{{s_k^T{{\hat{y}}_k}}}. \end{aligned}$$
(8)

Theorem 2

(Zhang et al. 1999) Assume that f(.) is a sufficiently smooth function. If \( \left\| {{s_k}} \right\| \) is small enough then we have:

$$\begin{aligned} \begin{gathered} s_k^T\left[ {Hf({x_{k + 1}}){s_k} - {{\hat{y}}_k}} \right] = O({\left\| {{s_k}} \right\| ^4}), \\ s_k^T\left[ {Hf({x_{k + 1}}){s_k} - {{ y}_k}} \right] = O({\left\| {{s_k}} \right\| ^3}). \\ \end{gathered} \end{aligned}$$
(9)

Assume that \( B_{k+1} \) and \( {{\hat{B}}_{k + 1}} \) satisfy Eqs. (3) and (7), respectively. In this case, by the above theorem, the following relations hold:

$$\begin{aligned} s_k^T{{\hat{B}}_{k + 1}}{s_k}= & {} s_k^THf({x_{k + 1}}){s_k} + O({\left\| {{s_k}} \right\| ^4}), \nonumber \\ s_k^T{B_{k + 1}}{s_k}= & {} s_k^THf({x_{k + 1}}){s_k} + O({\left\| {{s_k}} \right\| ^3}). \end{aligned}$$
(10)

The two relations shown in (10) show that in the presence of the assumptions of Theorem 2, \( s_k^T{{\hat{B}}_{k + 1}}{s_k} \) approximates the second-order curvature \( s_k^THf({x_{k + 1}}){s_k} \) with a higher precision than \( s_k^T{B_{k + 1}}{s_k} \).

The following result can be obtained from Relations (10) and Theorem 2.

Corollary 1

(Zhang and Xu 2001) If f is a cubic function then \( s_k^T{{\hat{B}}_{k + 1}}{s_k} = s_k^THf({x_{k + 1}}){s_k} \).

In the class of Huang, for preserving the positive definiteness instead of the curvature condition, the following condition should be satisfied:

$$\begin{aligned} s_k^T{{\hat{y}}_k} = s_k^T{y_k} + {\theta _k} > 0. \end{aligned}$$
(11)

In order for condition (11) to hold, two methods are suggested in Zhang and Xu (2001). In the first method, using the Wolfe line search and, in the second one, imposing the following condition on \( {\theta _k} \) are offered.

$$\begin{aligned} {\theta _k} := (\varepsilon - 1)s_k^T{y_k},\,\,\,\,\,if\,\,\,\,\,\,{\theta _k} < (\varepsilon - 1)s_k^T{y_k},\,\,\,\varepsilon \in (0,1). \end{aligned}$$
(12)

Clearly, condition (12) implies relation \( s_k^T{{\hat{y}}_k} \geqslant \varepsilon s_k^T{y_k} \). Thus, by considering condition (12) and the curvature condition, the updating formula of the Huang class preserves positive definiteness. Since the first method (using the Wolfe line search) imposes a large computational effort, in this paper, we use the second method to preserve the positive definiteness property of the matrix obtained from the updating formula of the Huang class.

Based on the above statements, if f(.) is a strongly convex function, then the matrix \( B_{k+1} \) obtained from each of the mentioned updating formulas for approximating the Hessian matrix always preserves positive definiteness. Thus, in the rest of this paper, we assume that all components of the vector function F in Problem (1) are strongly convex.

3.2 Quasi-Newton methods for multiobjective optimization

Recently, the BFGS quasi-Newton method for single objective optimization problems has been extended to multiobjective optimization by Qu et al. (2011) and Povalej (2014). In a similar manner, all of the quasi-Newton methods can be generalized to multiobjective problems. In this paper, besides the BFGS QNMO method, we further consider the SS-BFGS and H-BFGS QNMO methods. In this subsection, we briefly explain an extension of the quasi-Newton methods to multiobjective problems.

As seen in the previous subsection, in order to find the search direction at x, we have to solve Subproblem (2). The direct extension of this idea to a vector valued function \( F:V \rightarrow \mathbb {R}^m \) leads to the following multiobjective subproblem for finding a search direction at x:

$$\begin{aligned} \begin{array}{*{20}{l}} {\mathop {\min }\limits _{{d} \in {\mathbb {R}^n}} {} {} {} \nabla {F_1}{{({x})}^T}{d} + \frac{1}{2} d^TB^1(x){d}}\\ {\mathop {\min }\limits _{{d} \in {\mathbb {R}^n}} {} {} {} \nabla {F_2}{{({x})}^T}d + \frac{1}{2} d^TB^2(x){d}}\\ {{} {} {} {} \vdots }\\ {\mathop {\min }\limits _{{d} \in {\mathbb {R}^n}} {} {} {} \nabla {F_m}{{({x})}^T}d + \frac{1}{2} d^TB^m(x){d}}, \end{array} \end{aligned}$$
(13)

where \( B^j(x) \) is an approximation of the Hessian matrix of jth objective function \( F_j \) at x.

Subproblem (13) is a nice quadratic multiobjective optimization subproblem because it has some properties as given in Theorem 3. Any objective function of (13) is a bounded-below quadratic strictly convex function. The following theorem holds for Subproblem (13).

Theorem 3

Let Y and \( Y_N \) denote the feasible set in the objective space and the set of all nondominated points of Subproblem (13), respectively. Suppose that \( {B^j}(x) \succ 0 \) for all \( j \in \{ 1,2,\ldots , \} \) and \( x \in V \). Then Subproblem (13) satisfies the following properties:

  1. 1.

    Y is an \( \mathbb {R}_ \ge ^m \)-compact set.

  2. 2.

    The efficient set of (13) is nonempty.

  3. 3.

    \( Y_N \) is externally stable.

  4. 4.

    \( Y_N \) is connected.

Proof

  1. 1.

    \( B^j(x) \) is a positive definite matrix, thus the jth objective function of (13) is bounded below. Hence, Y is \( \mathbb {R}_ \ge ^m \)-compact.

  2. 2.

    Since Y is \( \mathbb {R}_ \ge ^m \)-compact, by Corollary 2.15 in Ehrgott (2005), \( {Y_N} \ne \emptyset \).

  3. 3.

    External stability of \( Y_N \) is concluded from Part (1).

  4. 4.

    This part is a direct consequence of Theorem 3.35 in Ehrgott (2005).

\(\square \)

In what follows, the relation between Subproblem (13) and the criticality of a given point \( x \in V \) is investigated.

Theorem 4

Let \( {B^j}(x) \succ 0 \) for all \( x \in V \) and \( j \in \{1,\ldots , m\} \). Then for \( F:V \rightarrow \mathbb {R}^m \) the following propositions are equivalent:

  1. 1.

    x is a critical point for F;

  2. 2.

    \(\hat{d}=0\) is a weakly efficient solution of Subroblem (13);

  3. 3.

    There is a vector \( \hat{\lambda }\in \mathbb {R}_ \ge ^m \) with \(\sum _{j=1}^m{\hat{\lambda }}_j=1\) such that

    $$\begin{aligned} \forall d \in {\mathbb {R}^n} \setminus \{ 0\} ,\sum \limits _{j = 1}^m {{{\hat{\lambda }}_j}} \left( {\nabla {F_j}{{(x)}^T}d + \frac{1}{2}{d^T}{B^j}(x)d} \right) > 0, \end{aligned}$$
  4. 4.

    There is a vector \( \hat{\lambda }\in \mathbb {R}_ \ge ^m \) with \(\sum _{j=1}^m{\hat{\lambda }}_j=1\) such that

    $$\begin{aligned} \sum \limits _{j = 1}^m {{{\hat{\lambda }}_j}\nabla {F_j}(x)} = 0. \end{aligned}$$

Proof

  1. 1.

    (\(1\Longrightarrow 2\)) By contradiction, assume that \(\hat{d}=0\) is not a weakly efficient solution of Subproblem (13). Hence there exists \(d\in \mathbb {R}^n\) such that

    $$\begin{aligned} \nabla F_j(x)^Td\le \nabla F_j(x)^Td+ \frac{1}{2}{d^T}B^j(x)d<0,~~\forall j\in \{1,\dots ,m\}. \end{aligned}$$

    Thus, x is not critical, which contradicts (1).

  2. 2.

    (\(2\Longrightarrow 3\)) Because \(\hat{d}=0\) is a weakly efficient solution and (13) is a strictly convex multiobjective subproblem, there is a vector \({\hat{\lambda }} \in \mathbb {R}^m_\ge \) with \(\sum _{j=1}^m{\hat{\lambda }}_j=1\) such that \(\hat{d}=0\) is the unique optimal solution of the single objective problem \( \mathop {\min }\nolimits _{d \in {\mathbb {R}^n}} {f_{\hat{\lambda }}}(d) \), where

    $$\begin{aligned} f_{\hat{\lambda }} (d):=\sum \limits _{j = 1}^m {{{\hat{\lambda }}_j}} \left( {\nabla {F_j}{{(x)}^T}d + \frac{1}{2}{d^T}{B^j}(x)d} \right) . \end{aligned}$$

    This completes the proof.

  3. 3.

    (\(3\Longrightarrow 4\)) Since \(f_{\hat{\lambda }} (d)\) is a strictly convex function, \(\hat{d}=0\) is the minimizer of \(f_{\hat{\lambda }} (d)\) if and only if

    $$\begin{aligned} \nabla _d {f_{\hat{\lambda }}}(0)=\sum \limits _{j = 1}^m {{\hat{\lambda }}_j} {\nabla {F_j}(x)} =0. \end{aligned}$$
  4. 4.

    (\(4\Longrightarrow 1\)) By contradiction, assume that Part (4) holds, but x is not critical. Then there exists \(d\in \mathbb {R}^n\) such that \(\nabla F_j(x)^Td <0\) for all \(j \in \{1,\dots ,m\}\). Therefore, for any \( \lambda \in \mathbb {R}^n_\ge \) with \( \sum \nolimits _{j = 1}^m {{\lambda _j}} = 1 \), we have \(\sum \nolimits _{j = 1}^m {{\lambda _j}\nabla {F_j}(x)} d < 0\) which is a contradiction to the assumption.

\(\square \)

As mentioned in Povalej (2014), the quasi-Newton search direction at \( x \in V \) is the unique minimizer of the following minimax subproblem which is a weakly efficient solution of the multiobjective optimization Subproblem (13).

$$\begin{aligned} \mathop {\min }\limits _{{d} \in {\mathbb {R}^n}} \,\,\mathop {\max }\limits _{j \in \{ 1,..,m\} } \,\,\nabla F_j(x)^T{d} + \frac{1}{2} d^TB^j(x)d. \end{aligned}$$
(14)

This suproblem is evidently equivalent to the following subproblem:

$$\begin{aligned} \begin{array}{l} \min \,\,h(t,d) = t\\ s.\,t\,\,\left\{ \begin{array}{l} \nabla {F_j}{(x)^T}d + \frac{1}{2}{d^T}B^j(x)d \le t,\,\,\,\,(1 \le j \le m),\\ (t,d) \in \mathbb {R} \times {\mathbb {R}^n}, \end{array} \right. \end{array} \end{aligned}$$
(15)

Subproblem (15) is a convex subproblem with quadratic constraints. From now on, for each \( x \in V \), we assume that d(x) and \( \alpha (x) \) are the optimal solution and the optimal value of Subproblem (14), respectively; that is

$$\begin{aligned} \alpha (x): = \mathop {\inf }\limits _{d \in {\mathbb {R}^n}} \mathop {\max }\limits _{j = 1,\,\ldots ,m} (\nabla {F_j}{(x)^T}d + \frac{1}{2}{d^T}B^j(x)d), \end{aligned}$$
(16)

and

$$\begin{aligned} d(x): = \arg \mathop {\min }\limits _{d \in {\mathbb {R}^n}} \mathop {\max }\limits _{j = 1,\ldots ,m} (\nabla {F_j}{(x)^T}d + \frac{1}{2}{d^T}B^j(x)d). \end{aligned}$$
(17)

For completeness, Theorem 5 from Fliege et al. (2009), Povalej (2014) is given, which states some properties of \( \alpha (x) \) and d(x) .

Theorem 5

Consider \( \alpha (x) \) and d(x) as defined by (16) and (17), respectively. Let \( {B^j}(x) \succ 0 \) for all \( x \in V \). Then

  1. 1.

    For each \( x \in V \), \( \alpha (x) \le 0 \)

  2. 2.

    The following conditions are equivalent:

    1. (a)

      Point x is not critical,

    2. (b)

      \( \alpha (x) < 0 \),

    3. (c)

      \( d(x) \ne 0 \).

  3. 3.

    The function \( d\,:\,V \rightarrow {\mathbb { R}^n} \) is bounded on compact sets and \( \alpha :V \rightarrow \mathbb { R} \) is continuous.

The interested readers are referred to Fliege et al. (2009), Povalej (2014) for the proof of Theorem 5.

Here, at first, a brief review of the Karush–Kuhn–Tucker (KKT) conditions is given (Fliege et al. 2009; Povalej 2014). At the end of this section, a simple representation of these conditions is proposed, which is based on the multiobjective Subproblem (13). Using the Lagrangian function of Subproblem (15), an explicit representation of d(x) in terms of \( \nabla {F_j}(x)\) and \({B^j}(x)\) is stated. To this aim, consider the Lagrangian function of Subproblem (15) as

$$\begin{aligned} L((t,d),\lambda ) = t + \sum \limits _{j = 1}^m {{\lambda _j}} \left( {\nabla {F_j}{{(x)}^T}d + \frac{1}{2}{d^T}{B^j}(x)d - t} \right) . \end{aligned}$$

Subproblem (15) has the unique optimal solution \( (t^*,d^*): = (\alpha (x), d(x)) \) and a Slater point \( (t,d)=(1,0) \). Hence, there exists a KKT multiplier \( \lambda : =\lambda (x) \), that, for \( d: =d(x) \) and \( \alpha : = \alpha (x) \) satisfies the following conditions:

$$\begin{aligned} \sum \limits _{j = 1}^m {{\lambda _j}} = 1, \sum \limits _{j = 1}^m {{\lambda _j}\left( {\nabla {F_j}(x) + {B^j}(x)d} \right) } = 0, \end{aligned}$$
(18)
$$\begin{aligned} {\lambda _j}\left( {\nabla {F_j}{{(x)}^T}d + \frac{1}{2}{d^T}{B^j}(x)d - t} \right) = 0,\,\,\,\,\,\,\,\,\,\,\,\,(1 \le j \le m), \end{aligned}$$
(19)
$$\begin{aligned} {\lambda _j} \ge 0,\,\,\,\,\,\,\,\,\,\,\,\,\,\nabla {F_j}{(x)^T}d + \frac{1}{2}{d^T}{B^j}(x)d \le t,{} {} {} {} {} {} {} {} {} (1 \le j \le m). \end{aligned}$$
(20)

In particular, from (18) we have

$$\begin{aligned} d(x) = - {} {\left( {\sum \limits _{j = 1}^m {{\lambda _j}(x){B^j}(x)} } \right) ^{ - 1}}\left( {\sum \limits _{j = 1}^m {{\lambda _j}(x)\nabla {F_j}(x)} } \right) . \end{aligned}$$
(21)

The existence of the KKT multipliers for the convex Subproblem (15) implies that there is no duality gap, and so

$$\begin{aligned} \alpha (x)&= \mathop {\mathop {\sup \limits _{\scriptstyle \lambda \ge 0}}\limits _{\sum \nolimits _{{\lambda _j} = 1}}} {} {} \mathop {\inf }\limits _{d \in {\mathbb { R}^n}} L((t,d),\lambda ) \nonumber \\&= \mathop {\mathop {\sup \limits _{\scriptstyle \lambda \ge 0}}\limits _{\sum \nolimits _{{\lambda _j} = 1}}} {} {} \mathop {\inf }\limits _{d \in {\mathbb { R}^n}} \sum \limits _{j = 1}^m {{\lambda _j}} \left( {\nabla {F_j}{{(x)}^T}d + \frac{1}{2}{d^T}{B^j}(x)d - t} \right) . \end{aligned}$$
(22)

Theorem 23 shows that the KKT conditions (18) can be obtained easily using the multiobjective Subproblem (13).

Theorem 6

Let \(\hat{d}\) be the quasi-Newton direction of function F at x. Then there exists \( {\hat{\lambda }}\in \mathbb {R}^m_\ge \) such that

$$\begin{aligned} \sum \limits _{j = 1}^m {{{\hat{\lambda }}_j}} = 1,\,\,\, \sum \limits _{j = 1}^m {{{\hat{\lambda }}_j}\left( {\nabla {F_j}(x) + {B^j}(x){\hat{d}}} \right) } = 0. \end{aligned}$$
(23)

Proof

Assume that \(\hat{d}\) is the quasi-Newton direction of F in x. Then, there is a scalar \(\hat{t}\) such that \((\hat{t},\hat{d})\) is an optimal solution of Subproblem (15). Thus, \(\hat{d}\) is a weakly efficient solution of the convex multiobjective Subroblem (13). Therefore, there exists \({\hat{\lambda }} \in \mathbb {R}^m_\ge \) with \(\sum \nolimits _{j= 1}^m {{{\hat{\lambda }} _j}} = 1\) such that \(\hat{d}\) is the minimum of function \( f_{\hat{\lambda }} (d):=\sum \nolimits _{j = 1}^m {{\lambda _j}} \left( {\nabla {F_j}{{(x)}^T}d + \frac{1}{2}{d^T}{B^j}(x)d} \right) \). Hence

$$\begin{aligned} \nabla _d {f_{\hat{\lambda }}}(\hat{d}) = \sum \limits _{j = 1}^m {{{\hat{\lambda }}_j}\left( {\nabla {F_j}(x) + {B^j}(x) \hat{d}} \right) } = 0, \end{aligned}$$

and this is precisely the assertion of the theorem. \(\square \)

Assume that \(d(x)\in \mathbb R^n\) is a direction with \(JF(x)d(x)<0\). To compute a step length \( t>0\), an Armijo-like rule is used. Let \( \sigma \in (0,1)\) be a prescribed constant. The condition to accept t is

$$\begin{aligned} F_j(x+t d(x))\le F_j(x)+\sigma t \alpha (x) ,~~\text {for all }j \in \{1,\dots ,m\}. \end{aligned}$$

Start with \(t=1\) and, while this condition is not satisfied, set \(t:=\frac{t}{2}\). Using this line search, we find the appropriate step length in Algorithm 1.

Povalej (2014) and Qu et al. (2011) used the Armijo line search in the implementation of the BFGS quasi-Newton method for multiobjective optimization. The convergence of this method was proven in Povalej (2014) and Qu et al. (2011). In this paper, besides the BFGS quasi-Newton method, some other well-known quasi-Newton methods are implemented for MOP in two cases: with line search (BFGSLS, SS-BFGSLS and H-BFGSLS) and without any line search (BFGS, SS-BFGS and H-BFGS). Related to these two cases, two algorithms are given below.

Algorithm 1. QNMO with Armijo line search

  1. 1.

    (Initialization)

    1. (a)

      Choose \( \varepsilon > 0 \), \( {x_0} \in V \), \( \sigma \in (0,1) \) and set \( B^j_0 =I\) for all \( j \in \{ 1, 2,\ldots ,m\} \).

    2. (b)

      Solve Subproblem (15) for \( {x_0} \) to obtain \( {d_0} = d({x_0}) \) and \({\alpha _0} = \alpha ({x_0}) \). If \( \left| {{\alpha _0}} \right| \le \varepsilon \), then stop. Otherwise, set \( {x_1} = {x_0} + {d_0} \), \( k=1 \) and go to the Main loop.

  2. 2.

    (Main loop)

    1. (a)

      Solve Subproblem (15) for \( {x_k} \) to obtain \( {d_k} = d({x_k}) \) and \( {\alpha _k} = \alpha ({x_k}) \).

    2. (b)

      If \(\left| {{\alpha _k}} \right| \le \varepsilon \) then stop. Otherwise, find the largest \( {t_k} = \frac{1}{{{2^n}}} \) for \( n = 1,\,2,\,3,\ldots \) that satisfies the following conditions

      $$\begin{aligned} \begin{array}{l} {x_k} + t_k{d_k} \in V\\ {F_j}({x_k} + t_k{d_k}) \le {F_j}({x_k}) + \sigma t_k{\alpha _k} \end{array} \end{aligned}$$
      (24)

      and set \( x_{k+1}:=x_k+t_k d_k \).

    3. (c)

      Update \( B^j_k \) (for \( j \in \{ 1, 2,\ldots ,m\} \)) using one of the update formulas (4), (5) or (8). Set \( k = k + 1 \) and go to Step 2(a).

Algorithm 2. QNMO without any line search

  1. 1.

    (Initialization)

    1. (a)

      Choose \( \varepsilon > 0 \), \( {x_0} \in V \), \( \sigma \in (0,1) \) and set \( B^j_0 =I\) for all \( j \in \{ 1, 2,\ldots ,m\} \).

    2. (b)

      Solve Subproblem (15) for \( {x_0} \) to obtain \( {d_0} = d({x_0}) \) and \({\alpha _0} = \alpha ({x_0}) \). If \( \left| {{\alpha _0}} \right| \le \varepsilon \), then stop. Otherwise, set \( {x_1} = {x_0} + {d_0} \), \( k=1 \) and go to the Main loop.

  2. 2.

    (Main loop)

    1. (a)

      Solve Subproblem (15) for \( {x_k} \) to obtain \( {d_k} = d({x_k}) \) and \( {\alpha _k} = \alpha ({x_k}) \).

    2. (b)

      If \(\left| {{\alpha _k}} \right| \le \varepsilon \) then stop. Otherwise, set \( x_{k+1}=x_k+d_k \).

    3. (c)

      Update \( B^j_k \) (for \( j \in \{ 1, 2,\ldots ,m\} \)) using one of the update formulas (4), (5) or (8). Set \( k = k + 1 \) and go to Step 2(a).

The convergence analysis for Algorithm 1 (with Armijo line search) has been presented in Povalej (2014) and the proof of Algorithm 2 is similar to it. However, to facilitate access and make our exposition self-contained, the proof of Algorithm 2 is rendered in the next section.

4 Convergence analysis

At the beginning of this section, we state some lemmas from Fliege et al. (2009) and Povalej (2014), which are needed to prove the convergence theorem.

Lemma 2

(Fliege et al. 2009) Suppose that \( U \subset V \) is a convex set. Consider \( \varepsilon > 0 \) and \( \delta > 0 \) such that for any \( x,\,y \in U \) with \( \left\| {y - x} \right\| < \delta \),

$$\begin{aligned} \left\| {H{F_j}(y) - H{F_j}(x)} \right\| < \varepsilon \,\, for \,\, all \,\,j \in \{ 1,\ldots ,m\}. \end{aligned}$$
(25)

Under this assumption, for any \( x,\,y \in U \) such that \( \left\| {y - x} \right\| < \delta \), we have

$$\begin{aligned} \left\| {\nabla {F_j}(y) - [{\nabla {F_j}(x) + H{F_j}(x)(y - x)}]} \right\| < \varepsilon \left\| {y - x} \right\| , \end{aligned}$$
(26)

for all \( j \in \{ 1,\ldots ,m\} \).

Lemma 3

(Povalej 2014) Let \( U \subset V \) be a convex subset and \( \varepsilon ,\,\delta \in {\mathbb {R}_ \ge } \) be small constants such that for any \( x,\,y \in U \), with \( \left\| {y - x} \right\| < \delta \), (25) holds. Let \( \{x_k\} \) be a sequence generated by a quasi-Newton method and \( \{ B_k^j\} \,\,(j = 1,\,2,\ldots ,m) \) be the sequence generated by one of the update formulas (4), (5) or (8). Also, assume that for \( \varepsilon >0 \) there exists \( {k_0} \in \mathbb {N} \) such that for all \( k \ge {k_0} \)

$$\begin{aligned} \frac{{\left\| {\left( {H{F_j}({x_k}) - B_k^j} \right) (y - {x_k})} \right\| }}{{\left\| {y - {x_k}} \right\| }} < \frac{\varepsilon }{2} \,\,\,\,\, for \,\, all \,\, j = 1,\ldots ,m. \end{aligned}$$
(27)

Under these assumptions, for any \( k \ge {k_0} \), and any \( y \in U \) such that \( \left\| {y - x_k} \right\| < \delta \), we have

$$\begin{aligned} \left\| {\nabla {F_j}(y) - \left( {\nabla {F_j}({x_k}) + B_k^j(y - {x_k})} \right) } \right\| < \varepsilon \left\| {y - {x_k}} \right\| , \end{aligned}$$
(28)

for all \( j \in \{ 1,\ldots ,m \} \).

Lemma 4

(Fliege et al. 2009; Povalej 2014) Let \( x \in V \) and \( a ,\,b \in {\mathbb {R}_ > } \) such that \( a \le b \). Also, consider \( {{B^j}(x)} \) as an approximation of \( HF_j(x) \). If

$$\begin{aligned} aI \preceq {B^j}(x) \preceq bI,\,\,for\,all\,\,j = 1,{} 2,{}\ldots ,{} m, \end{aligned}$$
(29)

then the following two inequalities hold,

$$\begin{aligned} \left\{ \begin{array}{l} i)\,\,\,\frac{a}{2}{\left\| {d(x)} \right\| ^2} \le \left| {\alpha (x)} \right| \le \frac{b}{2}{\left\| {d(x)} \right\| ^2},\\ ii)\,\left| {\alpha (x)} \right| \le \frac{1}{{2a}}{\left\| {\sum \limits _{j = 1}^m {{\lambda _j}\nabla {F_j}(x)} } \right\| ^2}for\,all\,\,{\lambda _j} \ge 0,\\ \quad \quad j = 1,\,2,\ldots ,m,\,with\,\sum \limits _{j = 1}^m {{\lambda _j}} = 1, \end{array} \right. \end{aligned}$$
(30)

where \( \alpha (x) \) and d(x) are defined in (16) and (17), respectively.

During the following theorem and its corollary, using Lemmas 2-4 presented in this section, we will prove the convergence of Algorithm 2.

Theorem 7

Suppose \( \{x_k\} \) is an arbitrary sequence generated by Algorithm 2 and \( x_0 \in V \) is the initial point of it. Take \( B_k^j := {B^j}({x_k}) \) as an approximation of the Hessian matrix at \( x_k \) corresponding to jth objective function of F. Let there be \( r>0 \) such that \( G[{x_0},r] \subset V \) in which \( G[{x_0},r] \) is a closed ball with radius r centered at \( x_0 \). If we have \( a,\,\delta ,\,\varepsilon > 0 \), with \( a > \varepsilon \) such that

  1. 1.

    \( aI \preceq {B^j}(x) \) and \( aI \preceq H{F_j}(x) \) for all \( j = 1,\,2,\ldots ,m \) and \( x \in G[{x_0},r] \),

  2. 2.

    \( \left\| {H{F_j}(y) - H{F_j}(x)} \right\| < \varepsilon , \) for all \( j = 1,\ldots ,m \) and \( x, y \in G[{x_0},r] \) with \( \left\| {y - x} \right\| < \delta \),

  3. 3.

    \( \frac{{\left\| {\left( {H{F_j}({x_k}) - B_k^j} \right) (y - {x_k})} \right\| }}{{\left\| {y - {x_k}} \right\| }} < \frac{\varepsilon }{2}, \) for all \( j = 1,\ldots ,m \), \( y \in G[{x_0},r] \) and \( k \ge {k_0} \),

  4. 4.

    \( \left\| {{d_0}} \right\| < \min \{ \delta ,r (1 - \frac{\varepsilon }{a}) \} \),

then, for any \( k \in \mathbb {N} \),

  1. a.

    \( {x_k} \in G[{x_0},r] \),

  2. b.

    \( \left\| {{d_k}} \right\| < {\left( {\frac{\varepsilon }{a}} \right) ^k}\left\| {{d_0}} \right\| . \)

Proof

The proof is based on induction. Clearly, the proposition holds for \( k=0 \). Now, assume that the proposition holds for \( k\in \mathbb {N} \). We show that it also holds for \( k+1 \). \( \{x_k\} \) is a sequence generated by Algorithm 2. Thus, \( x_{k+1}=x_k+d_k \). Since the proposition holds for \( k \in \mathbb {N} \), we have

$$\begin{aligned} \left\| {{d_k}} \right\| \le \left\| {{d_0}} \right\| {\left( {\frac{\varepsilon }{a}} \right) ^k}\,\,\,and\,\,\,{x_k} \in G[{x_0},r]. \end{aligned}$$
(31)

Consequently, by the triangular inequality, and with the sum of the first \( k+1 \) terms of a geometric series and the assumption \( a > \varepsilon \), it is concluded that

$$\begin{aligned} \left\| {{x_{k + 1}} - {x_0}} \right\| \le \sum \limits _{i = 0}^k {\left\| {{x_{i + 1}} - {x_i}} \right\| } \le \left\| {{d_0}} \right\| \frac{{1 - {{\left( {\frac{\varepsilon }{a}} \right) }^{k + 1}}}}{{1 - \frac{\varepsilon }{a}}} < r, \end{aligned}$$
(32)

and this completes the proof of Part (a); that is \( {x_{k+1}} \in G[{x_0},r] \).

It is obvious that

$$\begin{aligned} \left\| {{x_{k + 1}} - {x_k}} \right\| \le \delta . \end{aligned}$$
(33)

According to (33), the assumptions (2) and (3), and Lemma 3, it follows immediately that

$$\begin{aligned} \left\| {\nabla {F_j}({x_{k + 1}}) - \left( {\nabla {F_j}({x_k}) + B_k^j{d_k}} \right) } \right\| < \varepsilon \left\| {{x_{k + 1}} - {x_k}} \right\| ,\,\,\,\,\,\,\,\,\forall j \in \{ 1,2,\ldots ,m\}. \end{aligned}$$
(34)

Let \( \lambda _k^j \, (j=1,\ldots , m) \) be the KKT multipliers of Subproblem (15) at \( x_k \). Hence, applying (34) yields

$$\begin{aligned} \sum \limits _{j = 1}^m {\lambda _k^j} \left\| {\nabla {F_j}({x_{k + 1}}) - \left( {\nabla {F_j}({x_k}) + B_k^j{d_k}} \right) } \right\| < \sum \limits _{j = 1}^m {\lambda _k^j} \varepsilon \left\| {{d_k}} \right\| . \end{aligned}$$
(35)

According to the properties of the KKT multipliers, including \( \sum \nolimits _{j = 1}^m {\lambda _k^j} = 1 \) and \( \lambda _k^j \ge 0 \) for all \( j=1,2,\ldots ,m \) and from (35), we have

$$\begin{aligned} \sum \limits _{j = 1}^m {\left\| {\lambda _k^j\nabla {F_j}({x_{k + 1}}) - \left( {\lambda _k^j\nabla {F_j}({x_k}) + \lambda _k^jB_k^j{d_k}} \right) } \right\| } < \varepsilon \left\| {{d_k}} \right\| . \end{aligned}$$
(36)

Using triangular inequality, we can rewrite (36) as

$$\begin{aligned} \left\| {\sum \limits _{j = 1}^m {\lambda _{k}^j\nabla {F_j}({x_{k + 1}})} - \left( {\sum \limits _{j = 1}^m {\lambda _k^j\nabla {F_j}({x_k})} + \left( {\sum \limits _{j = 1}^m {\lambda _k^jB_k^j} } \right) {d_k}} \right) } \right\| < \varepsilon \left\| {{d_k}} \right\| . \end{aligned}$$
(37)

(21) makes it obvious that

$$\begin{aligned} {d_k} = - {\left( {\sum \limits _{j = 1}^m {\lambda _k^jB_k^j} } \right) ^{ - 1}}\left( {\sum \limits _{j = 1}^m {\lambda _k^j\nabla {F_j}({x_k})} } \right) . \end{aligned}$$
(38)

Substituting (38) into (37) yields

$$\begin{aligned} \left\| {\sum \limits _{j = 1}^m {\lambda _{k}^j\nabla {F_j}({x_{k + 1}})} } \right\| < \varepsilon \left\| {{d_k}} \right\| . \end{aligned}$$
(39)

According to Part (ii) of Lemma 4 and (39), we deduce that

$$\begin{aligned} \left| {\alpha ({x_{k + 1}})} \right| < \frac{{{\varepsilon ^2}}}{2a}{\left\| {{d_k}} \right\| ^2}. \end{aligned}$$
(40)

By (40) and Part (i) of Lemma 4, it is obvious that

$$\begin{aligned} \left\| {{d_{k + 1}}} \right\| < \frac{\varepsilon }{a}\left\| {{d_k}} \right\| , \end{aligned}$$
(41)

and, consequently, we have

$$\begin{aligned} \left\| {{d_{k + 1}}} \right\| < {\left( {\frac{\varepsilon }{a}} \right) ^{k + 1}}\left\| {{d_0}} \right\| , \end{aligned}$$
(42)

which completes the proof. \(\square \)

Corollary 2

Under the assumptions of Theorem 7, the sequence generated by Algorithm 2 converges to a critical point.

Proof

Suppose \( \{ {x_k}\} \) is the obtained sequence from Algorithm 2. Then, from Theorem 7, for \( \epsilon >0 \) there exists \( m_0 \in \mathbb {N} \) such that for all n and m, where \( n > m \ge {m_0} \) we have:

$$\begin{aligned} \left\| {{x_n} - {x_m}} \right\| \le \sum \limits _{i = m}^{n - m} {\left\| {{x_{i + 1}} - {x_i}} \right\| } = \sum \limits _{i = m}^{n - m} {\left\| {{d_i}} \right\| } \le \left\| {{d_0}} \right\| \sum \limits _{i = m}^{n - m} {{{\left( {\frac{\varepsilon }{a}} \right) }^i}}. \end{aligned}$$
(43)

Since \( a>\varepsilon \) (from the assumptions of Theorem 7), (43) shows that \( \{ {x_k}\} \) is a Cauchy sequence. Thus, it is clear that \( \{ {x_k}\} \) is a convergent sequence and there exists \( \hat{x} \in {\mathbb {R}^n} \), such that

$$\begin{aligned} \mathop {\lim }\limits _{k \rightarrow \infty } {x_k} = \hat{x}. \end{aligned}$$

Besides, from Part (b) of Theorem 7 it is evident that

$$\begin{aligned} \mathop {\lim }\limits _{k \rightarrow \infty } \,{d_k} = 0 \end{aligned}$$
(44)

By Part (i) of Lemma 4, we get that \( \mathop {\lim }\nolimits _{k \rightarrow \infty } \,\alpha ({x_k}) = 0 \). According to Part (3) of Theorem 5, \( \alpha (x) \) is a continuous function. Hence, \( \alpha (\hat{x}) = 0 \). Thus, from Part (2) of Theorem 5 it is obvious that \( \hat{x} \) is a critical point. \(\square \)

In Corollary 2, we show that every initial point of Algorithm 2 converges to a critical point. To find the relations between critical point and (weakly) efficient solution, one can use Theorem 1.

5 Performance assessment

Unlike single objective methods, comparing the performance of different multiobjective optimization methods is a difficult task. This difficulty is due to the evaluation of the quality of the approximated nondominated frontier. There are several criteria that indicate the quality of the approximated nondominated frontier. In this paper, we use three criteria: the purity metric, the spread metric, Custodio et al. (2011), Bandyopadhyay et al. (2004) and the epsilon indicator (Knowles et al. 2006). Moreover, in order to have a good visual comparison, the numerical results are depicted using performance profiles (Dolan and More 2002). To this aim, in what follows we present a brief description of performance profiles and some performance assessment criteria.

5.1 Performance profiles

In order to compare different algorithms on a set of test problems, one can run these algorithms on given test problems and record such information as the number of iterations, the number of function evaluations, expended CPU time, and quality criteria. Considering these criteria, the performance of algorithms can be compared with each other using some tables. If the set of problems or the set of algorithms is suitably large, comparing these criteria using tables can be very difficult for readers. In 2002, Dolan and More introduced performance profiles as a means to evaluate and compare the performance of a set of algorithms S on a set of test problems P (Dolan and More 2002). In what follows, we briefly explain this method.

Let \( {t_{p,s}} \) denote the performance of the algorithm \( s \in S \) in solving the problem \( p \in P \). This performance (\( {t_{p,s}} \)) must have the property in which better performances correspond to lower values of \( {t_{p,s}} \). If the intended performance (\( \bar{t}_{p,s} \)) did not have this property, we would have to use \({t_{p,s}}=1/\bar{t}_{p,s} \) instead of \( \bar{t}_{p,s} \) as the performance. For example, one can consider \( {t_{p,s}} \) as the expended CPU times in solving problem p by algorithm s. However, if \(\bar{t}_{p,s}\) is the number of the obtained nondominated points in solving problem p by algorithm s, in order to plot the performance profiles, we have to consider \({t_{p,s}}=1/\bar{t}_{p,s} \) instead of \( \bar{t}_{p,s} \) as the performance. Accordingly, the performance ratio for the algorithm \( s \in S \) in solving the problem \( p \in P \) is defined as follows:

$$\begin{aligned} {r_{p,s}} = \frac{{{t_{p,s}}}}{{\min \{ {t_{p,s}}:s \in S\} }}. \end{aligned}$$
(45)

If the denominator in (45) is zero, we instead select the smallest positive value greater than zero in \( \{ {t_{p,s}}:s \in S\} \). Using the performance ratio, one can compare the performance of algorithm s on problem p with the best performance compared to the performance of the other algorithm for this problem. The performance of algorithm s on any given problem can be of interest, but we would like to obtain an overall assessment of the performance of the algorithm. Thus, \( \rho _s (\tau ) \) is defined as follows

$$\begin{aligned} {\rho _s}(\tau ) = \frac{1}{{\left| P \right| }}\left| {p \in P:{r_{p,s}} \le \tau } \right| . \end{aligned}$$
(46)

\( \rho _s (\tau ) \) is the probability for algorithm \( s \in S \) whose performance ratio is within the interval \( [1,\tau ] \). Indeed, \( \rho _s \) is the (cumulative) distribution function for the performance ratio. The performance profiles depict the function \( \rho _s (\tau ) \) for each algorithm \( s \in S \).

The value of \( \rho _s (1) \) is the probability that the algorithm s will win over the rest of the algorithms. Thus, if we are interested only in the number of wins, we need only to compare the values of \( \rho _s (1) \) for all of the algorithms. For example, \( \rho _{s_0} (1)=0.6 \) means that the algorithm \( s_0 \) has the best performance in solving 60 percent of the test problems in contrast to the rest of the algorithms. Moreover, algorithms with the largest probabilities \( \rho _s (\tau ) \) for large values of \( \tau \) have a better performance than the other algorithms.

5.2 Purity metric

The purity metric is an important criterion for comparing the nondominated frontiers obtained by different algorithms (Custodio et al. 2011; Bandyopadhyay et al. 2004). The approximated nondominated frontier obtained by algorithm \( s \in S \) for problem \( p \in P \) is denoted by \( F_{p,s} \). After removing the dominated elements from set \( { \cup _{s \in S}}{F_{p,s}} \), a set, namely the reference nondominated frontier which is denoted by \( F_p \) is obtained. The purity metric for algorithm \( s \in S \) and problem \( p \in P \) is defined as \( {{\bar{t}}_{p,s}} = {{c_{p,s}^{{F_p}}}/{{c_{p,s}}}} \) where \( c_{p,s}^{{F_p}} = \left| {{F_{p,s}} \cap {F_p}} \right| \) and \( {c_{p,s}} = \left| {{F_{p,s}}} \right| \). Higher values of \( \bar{t}_{p,s} \) show that algorithm \( s \in S \) yields a higher percentage of the nondominated points for problem \( p \in P \). Thus, we have to consider \({t_{p,s}}=1/\bar{t}_{p,s} \) instead of \( \bar{t}_{p,s} \) as the performance. Clearly, \( {\bar{t}_{p,s}} = 0 \) or \( t_{p,s} =\infty \) shows that the set of points obtained by algorithm s in solving problem p does not have any intersection with the reference nondominated frontier (\( F_p \)).

5.3 Spread metric

The spread metric is another metric to measure the spread of the obtained nondominated frontiers (Custodio et al. 2011; Bandyopadhyay et al. 2004). Naturally, we are interested in computing a set of points which spans the entire exact nondominated frontier. The spread metric is an attempt to measure the largest gap in the obtained nondominated frontiers. In the following, we explain how one can compute the value of the spread metric for each algorithm s in solving the problem p. First, we have to compute the extreme points of the reference nondominated frontier (\( F_p \)) in the objective space of the problem p. There is an extreme point corresponding to each component of the objective function. In other words, we have k extreme points for a k dimensional problem. The extreme point corresponding to the kth component is a point in the reference nondominated frontier whose kth component has the largest value. Assume that algorithm s has obtained a set of N points indexed by \( 1,\ldots , \, N \) as the approximated nondominated set. Add the extreme points to these points indexed by \( N+1,\ldots , \, N+k \) (in k dimensional objective space). Let \( f_{i,j}^{p,s} \) denote the jth component of the ith point of this collection. It is obvious that \( f_{i,j}^{p,s} \) is a scalar value. Suppose that the values of \( f_{i,j}^{p,s} \) were sorted by an increasing order in i for each j. Then, the spread metric of the algorithm s in solving the problem p is denoted by \( \Gamma _{p,s} \) and is defined as follows:

$$\begin{aligned} {\Gamma _{p,s}} = \mathop {\max }\limits _{j \in \{ 1,\ldots ,m\} } \,\left( {\mathop {\max }\limits _{i \in \{ 1,\ldots ,N + k\} } \{ {\delta _{i,j}^{p,s}}\} } \right) , \end{aligned}$$

where \( {\delta _{i,j}^{p,s}} = ({f_{i + 1,j}^{p,s}} - {f_{i,j}^{p,s}}) \).

Table 1 Description of all problems used in numerical implementations
Table 2 The average number of iterations (Iter) and function evaluations (Feval) for BFGSLS, SS-BFGSLS, and H-BFGSLS QNMO methods in solving some test problems for all initial points are compared
Table 3 Comparing the average number of iterations for the BFGS, SS-BFGS, and H-BFGS QNMO methods in solving all test problems for all initial points (Iter) and initial points that are converged during less than 500 iterations (CIter)

5.4 Epsilon indicator

The epsilon indicator is a criterion for representing the quality of the obtained nondominated frontiers. The epsilon indicator has two versions, multiplicative and additive. In this paper, we use the additive version. The additive epsilon indicator \( {I_{\varepsilon + }}(A) \) is defined with respect to the nondominated reference set \( F_p \) as:

$$\begin{aligned} {I_{\varepsilon + }}(A) = {I_{\varepsilon + }}(A,{F_p}) = \inf \{ \varepsilon |\forall y \in {F_p}{} {} \,\, \exists z \in A:\,z\, {\preceq }_{\varepsilon +} \,y\}, \end{aligned}$$

where A is the obtained nondominated set for the specified algorithm and \( {\preceq }_{\varepsilon +} \) is the additive \( \varepsilon \)-dominance inequality which is defined as follows:

$$\begin{aligned} z^1\, {\preceq }_{\varepsilon +} \, {z^2} \,\, \Leftrightarrow \,\,\forall i \in \{ 1,\ldots ,m\} ,\,\,z_i^1 \le \varepsilon + z_i^2, \end{aligned}$$

For simplicity of the notation, from now on, BFGS, self-scaling BFGS, and Huang BFGS quasi-Newton methods for multiobjective optimization without any line search are denoted by BFGS, SS-BFGS, and H-BFGS, respectively. In addition, BFGS, self-scaling BFGS, and Huang BFGS quasi-Newton methods for multiobjective optimization equipped with the Armijo line search are denoted by BFGSLS, SS-BFGSLS, and H-BFGSLS, respectively. In the following, we use the above mentioned performance assessment criteria, the number of iterations, the number of function evaluations, expended CPU time, and some other criteria to compare the BFGS, BFGSLS, SS-BFGS, SS-BFGSLS, H-BFGS, and H-BFGSLS methods.

6 Numerical results

A MATLAB implementation of the proposed QNMO methods was tested on some well-known test problems from the literature. Table 1 describes these test problems. The first column of this table contains the names of these problems. The dimensions of the test problems are given in Column 2. The notations \( x_L \) and \( x_U \) in Columns 2 and 3 stand for the lower and upper bounds of the decision variables. The last column provides a literature reference. All tests were executed within MATLAB R2013a and were solved 100 times using 100 initial points from a uniform random distribution between the given lower and upper bounds specified in Table 1. Therefore, all results reported in Tables 2, 3 and 4 and depicted in Figs. 1, 2, 3, 4, 5 and 6 are the average number of the obtained results from these 100 initial points. The corresponding approximations of the nondominated frontier of all methods are plotted in Figs. 7, 8, 9 and 10 for some of the test problems. In order to have a good approximation of the nondominated frontiers in these figures, the test problems with two and three objective functions are solved for 500 and 300 initial points, respectively. As mentioned before, the initial points for all methods are selected uniformly in the feasible set in the decision space. For instance, see the scatter plot in Fig. 7 which depicts 300 initial points selected to obtain the nondominated frontiers of test Problem PNR by the proposed QNMO methods.

Table 4 Comparing the average number of iterations (Iter) and function evaluations (Feval) for BFGSLS, SS-BFGSLS, and H-BFGSLS QNMO methods in solving some test problems only for initial points that are converged during less than 500 iterations
Fig. 1
figure 1

The left figure depicts the performance profiles of the average number of non-convergent points for QNMO methods and the right figure depicts the performance profiles of the average number of nondominated points for QNMO methods

The stop rule is \(|\alpha (x)|<\epsilon \), where \(\epsilon =5\sqrt{eps}\) and \(eps\cong 2.2204\times 10^{-16}\). The maximum number of iterations was set to 500. To avoid repetition from now on, an initial point which converges before 500 iterations is called (initial) convergent point and an initial point which does not converge in 500 iterations is called (initial) non-convergent point.

As seen in the Table 1, for each test problem we have box constraints of the form \(x_L\le x\le x_U\). These constraints are considered under the direction search Subproblem (15) in the way that the new point lies in the same box, i.e., \(x_L\le x + d \le x_U\) holds. Accordingly, in order to find a new direction in x, we practically use Subproblem (47) instead of Subproblem (15).

$$\begin{aligned} \begin{array}{*{20}{l}} {\min {} {} h(t,d) = t}\\ {s.{} t{} \,\,{} \left\{ {\begin{array}{*{20}{l}} \begin{array}{l} \nabla {F_j}{(x)^T}d + \frac{1}{2}{d^T}{B^j}(x)d \le t,{} {} {} \,\,{} (1 \le j \le m),\\ {x_L} - x \le d \le {x_U} - x, \end{array}\\ {(t,d) \in \mathbb {R} \times {\mathbb {R}^n}.} \end{array}} \right. } \end{array} \end{aligned}$$
(47)

In this paper, we have used the TOMLAB package for solving Subproblem (47).

In this section, all QNMO methods mentioned in Sect. 3 are compared from two points of view: (i) CPU times and the computational effort; (ii) The quality of the approximated nondominated frontiers. Tables 2, 3, 4 and Figs. 2, 3, 4 and 5 are devoted to compare our 6 methods from the first point of view and Figs. 1 and 6, 7, 8, 9, 10 are dedicated to compare all methods from the second point of view. In Tables 2, 3 and 4, corresponding to all test problems, the algorithms with the best performance are specified. The approximated nondominated frontiers of some test problems that obtained by the proposed QNMO Methods are compared in Figs. 7, 8, 9 and 10.

Fig. 2
figure 2

Comparison of the performance profiles of the average number of iterations for QNMO methods. The left and right figures depict the performance profiles of the average number of iterations for all points and convergent points, respectively

Fig. 3
figure 3

Comparison of the performance profiles of the average number of function evaluations for all QNMO methods. The left and right figures depict the performance profiles of the average number of function evaluations for all points and convergent points, respectively

Fig. 4
figure 4

Comparison of the performance profiles of the average expended CPU times for QNMO methods. For a better comparison, each of these methods, with line search and without any line search, are depicted in individual figures

Fig. 5
figure 5

Performance profiles of the average number of iterations for QNMO methods with line search and without any line search

Fig. 6
figure 6

Comparison of the performance profiles of the purity metric, spread metric, and epsilon indicator for QNMO methods

Fig. 7
figure 7

Comparison between the nondominated frontiers obtained from QNMO methods for Problem PNR

Fig. 8
figure 8

Comparison between the nondominated frontiers obtained from QNMO methods for Problem DD1a

Fig. 9
figure 9

Comparison between the nondominated frontiers obtained from QNMO methods for Problem JOS1a

Fig. 10
figure 10

Comparison between the nondominated frontiers obtained from QNMO methods for Problem FDS with 5 variable \( - 2 \le {x_i} \le 2 \)

As seen in Figs. 1, 2, 3 and 4, the variations of the profiles in the small values of \(\tau \) are more than their variations in the large values of \(\tau \). Moreover, from a large values of \(\tau \) onwards, the profiles nearly turn into a horizontal line. Thus, in order to have a better comparison, we zoomed in on the smaller values of \(\tau \). As mentioned in Sect. 5.1, the value of \(\rho _s(1)\) in all profiles is more important than the other values. Thus, in order to have a more convenient comparison among the profiles, in this article the point is specified by a tiny circle that has the same color as the color of the related profile.

Tow criteria that may be of interest in comparing QNMO methods are the number of non-convergent points and the number of obtained nondominated points. All proposed QNMO methods are compared in these criteria in Fig. 1. According to this figure, in the number of the nondominated points, the methods SS-BFGSLS, SS-BFGS, H-BFGSLS, BFGS, H-BFGS, and BFGSLS have the best performance in a decreasing order. As seen in Fig. 1, the H-BFGSLS method has a considerably better performance than the other methods in the number of non-convergent points such that the value of \( \rho (\tau ) \) for the H-BFGSLS method is almost equal to one for all values of \( \tau \). Moreover, the worst performance in this regard is due to the SS-BFGSLS and BFGSLS methods.

Figures 2, 3, 4 and 5 and Tables 2, 3, 4 compare the methods in three aspects: the speed of convergence, the number of iterations, and the number of function evaluations. Looking at Figs. 2, 3, 4 and 5 and Tables 2, 3, 4, it is evident that the H-BFGSLS method has considerably a better performance than the other methods. The worst performance in this regard is due to the BFGSLS method. In the three points of view mentioned above, Figures 4 and 5 show that, besides the H-BFGSLS method, the other methods in the absence of line search have a better performance than when Armijo line search is used.

Figure 6 compares the methods in the purity metric, spread metric, and epsilon indicator. These factors are used to compare the quality of the obtained nondominated frontiers. Based on these factors, the best performance is due to the SS-BFGS and SS-BFGSLS methods, and the worst performance is due to the H-BFGSLS, H-BFGS, BFGSLS, and BFGS methods in a decreasing order.

In order to have a good comparison, the obtained nondominated frontiers related to some of the test problems presented in Table 1 are depicted in Figs. 7, 8, 9 and 10. Also, in Figs. 7, 8 and 9, the obtained nondominated frontiers are compared to the reference frontier.

7 Discussion and conclusion

In this paper, a new QNMO method in the self-scaling Broyden class and a new QNMO method in the Huang class were presented. Based on the observed results in Sect. 6, these methods have some advantages compared to the methods in the Broyden class. In order to have a good comparison, all methods are compared both in the presence of the Armijo line search rule and without any line search.

A comprehensive comparative study of QNMO methods has been presented. All methods are compared from two general points of view: (1) The computational effort and the speed of convergence (2) The quality of the obtained nondominated frontiers. The obtained numerical results show that from the first point of view, the best methods are the H-BFGSLS method followed by the SS-BFGS method. The worst performance in this regard is due to the BFSGLS method. In the second case, the factors of the number of the obtained nondominated points, the purity metric, spread metric, and epsilon indicator are used. From the second point of view, the SS-BFGSLS method followed by the SS-BFGS method have the best performance. Although the SS-BFGSLS method has a better performance in obtaining the nondominated frontier than the SS-BFGS method, it is considerably weaker than the SS-BFGS method regarding the speed of the convergence. Moreover, although the H-BFGSLS method has a weaker performance in terms of the quality of the obtained nondominated frontier than the other methods, it has the best performance in the speed of convergence.

The methods of the Broyden and the self-scaling Broyden classes, where no line search was used (Algorithm 2), have a higher speed of convergence compared to the case where the Armijo line search is used. Lack of any line search in these methods reduces the quality of the obtained nondominated frontier. However, this weakness is negligible when compared with the enhancement of the speed of convergence. The increase in the speed of convergence in the case of no line search is the result of the difference in using the Armijo line search for single objective and multiobjective problems. In order to choose the step length in the Armijo rule, we should find the largest number satisfying the Armijo inequality (or inequalities). The number of inequalities in the Armijo rule for single objective problems is only one, while for multiobjective case it equals to the number of objectives of the problem. Thus, in the multiobjective case, the Armijo step length is a number that simultaneously satisfies all of the inequalities. This fact shrinks the step length and, as a result, the speed of convergence is significantly decreased in multiobjective case.

In the methods of the Broyden and the self-scaling Broyden class only gradient information was used for approximating the Hessian matrix, while in the method of the Huang class, in addition to gradient information, function value information was also employed. As a result of this fact, according to Theorem 2, the method of the Huang class has a higher accuracy in approximating the Hessian matrix than the methods of the Broyden and the self-scaling Broyden classes. For this reason, the number of function evaluations in the H-BFGSLS method is less than the BFGSLS and SS-BFGSLS methods (see Table 2).

Unlike the methods of the Broyden and the self-scaling Broyden classes, using the Armijo line search for the method of the Huang class has been useful compared to the case where no line search is used in both aspects of the speed of convergence and the quality of the obtained nondominated frontier.

As a result of the above discussions and the observed results, we can conclude that, generally, the H-BFGSLS and SS-BFGS methods have a better performance than the other methods proposed in this paper.