1 Introduction

The sum-of-ratios problem, which is to minimize (maximize) a sum of several fractional functions in convex set, is a non-convex optimization problem that is difficult to solve by the traditional optimization methods. The sum-of-ratios problem has attracted great interest over the 40 years for its practical and theoretical significance. The problem occurs in many important applications such as finite element method, computer graphics, finance, engineering and management [5, 6, 12, 15, 17]. In [12], many problems in projective geometry such as multi-view triangulation, camera resectioning and homography estimation were formulated as the sum-of-ratios problem and the branch-and-bound method,which is based on the recent developments in fractional programming and the theory of convex underestimators [2,3,4,5, 9, 14, 16, 17, 21, 24,25,26],was employed to find its global solution. In the method of [12], the number of variables increases as twice as the number of fractional functions and it requires to solve a second-order cone programming problem to obtain a lower bound of the optimal value in each iteration. In recent years, some algorithms has been developed for the global solution to these optimization problems [2, 3, 9, 11, 18, 19, 25]. Given any tolerance \(\varepsilon \), if the optimization problem is feasible, the algorithms return a solution which is at most \(\varepsilon \) far from the global optimum. But the methods based on branch-and-bound procedure require a lot of computations and have low convergence rate.Furthermore, it is not easy to find a reasonable branching strategy.

In this paper, we propose an efficient algorithm which transforms the sum-of-ratios problem into parametric convex programming problem and solves the convex programming problem in each iteration. Unlike the methods based on the branch-and-bound, our method does not need to introduce new variables and constraints, and it also does not need to use additional complicated computations for obtaining the upper bounds and lower bounds for numerators and denominators of every ratio, so it is easier to implement than the previous methods. This paper is organized as follows. In Section 2,we show that the sum-of-ratios problem is equivalent to some kind of parametric convex programming problem. In Section 3, it is about a Newton-like algorithm for finding optimal parameters associated with the optimal solution of the sum-of-ratios problem and its convergence characteristics were amply considered. In Section 4, several test problems in the previous studies are used to show the advantage of our algorithm over the other methods, and the experiment results show that our algorithm has better performance than other methods. Finally, Section 5 provides some discussions.

2 Equivalent parametric convex programming

The sum-of-ratios problem is to minimize the sum of fractional functions subject to convex constraints, which is formulated as follows.

$$\begin{aligned} \min \sum \limits _{i=1}\limits ^{p}F_i(x),\quad \text {s.t.}\quad x\in X \end{aligned}$$
(1)

where \( F_i(x)=\frac{f_i(x)}{h_i(x)},i=1,\cdots ,p,\quad X=\{x\in R^n|g_i(x)\le 0, i=1,\cdots ,m\}\), and \(f_i(x),g_i(x)\) and \(-h_i(x)\) are twice continuously differentiable convex functions.

It is assumed that the feasible set X is bounded, \(f_i(x)\ge 0\) and \(h_i(x)>0\) for every \(x\in X\), and that \(intX=\{x\in R^n|g_i(x)< 0, i=1,\cdots ,m\}\ne \emptyset \) (Slater condition). Even with these restrictions the above problem is NP-complete [8].

It is easy to see that the problem (1) is equivalent to the following problem.

$$\begin{aligned}&\min \sum \limits _{i=1}\limits ^{p}\beta _i,\quad \text{ s.t. }\nonumber \\&\quad F_i(x)\le \beta _i,i=1,\cdots ,p,\quad x\in X,\quad \beta \in R^p\ \end{aligned}$$
(2)

Let \( \beta =(\beta _1, \cdots ,\beta _p)\) and \( \gamma =(\gamma _1, \cdots ,\gamma _p)\).

Theorem 1.

If \(({\bar{x}},{\bar{\beta }})\) is the solution of the problem (2), then there exists \({\bar{\gamma }}\) such that \({\bar{x}}\) is a solution of the following convex programming problem for fixed \(\gamma ={\bar{\gamma }}\) and \(\beta ={\bar{\beta }}\).

$$\begin{aligned} \min \sum \limits _{i=1}\limits ^{p}\gamma _i (f_i(x)-\beta _i h_i(x)),\quad \text {s.t.}\quad x\in X \end{aligned}$$
(3)

And \({\bar{x}}\) also satisfies the following system of equations for \(\gamma ={\bar{\gamma }}\) and \(\beta ={\bar{\beta }}\):

$$\begin{aligned}&\gamma _i=\frac{1}{h_i(x)}, i=1,\cdots ,p \end{aligned}$$
(4)
$$\begin{aligned}&\quad f_i(x)-\beta _i h_i(x)=0, i=1,\cdots ,p \end{aligned}$$
(5)

Proof.

The constraint \(F_i(x)\le \beta _i\) is equivalent to \(f_i(x)-\beta _i h_i(x)\le 0\). Let’s define the following function for the problem (2).

$$\begin{aligned}&L(x,\beta ,w,\gamma ,v)=w\sum \limits _{i=1}\limits ^{p}\beta _i+\sum \limits _{i=1}\limits ^{p}\gamma _i(f_i(x)-\beta _i h_i(x))\\&\quad +\sum \limits _{i=1}\limits ^{m}v_ig_i(x). \end{aligned}$$

By Fritz-John optimality condition (Theorem 4.2.8 of [1]), there exist \({\bar{w}}, {\bar{\gamma }}=(\bar{\gamma }_{1},\cdots ,\bar{\gamma }_{p})\) and \({\bar{v}}=(\bar{v}_{1},\cdots ,\bar{v}_{m})\) such that

$$\begin{aligned}&\frac{\partial L}{\partial x}=\sum \limits _{i=1}\limits ^{p}\bar{\gamma }_{i} (\nabla f_i({\bar{x}})-\bar{\beta }_{i}\nabla h_i({\bar{x}}))+\ \sum \limits _{i=1}\limits ^{m}\bar{v}_{i} \nabla g_i({\bar{x}})=0 \end{aligned}$$
(6)
$$\begin{aligned}&\quad \frac{\partial L}{\partial \beta _i}={\bar{w}}-\bar{\gamma }_{i} h_i({\bar{x}})=0, i=1,\cdots ,p \end{aligned}$$
(7)
$$\begin{aligned}&\quad \bar{\gamma }_{i}\frac{\partial L}{\partial \gamma _i}=\bar{\gamma }_{i}(f_i({\bar{x}})-\bar{\beta }_{i} h_i({\bar{x}}))=0, i=1,\cdots ,p \end{aligned}$$
(8)
$$\begin{aligned}&\quad \bar{v}_{i}\frac{\partial L}{\partial v_i}=\bar{v}_{i} g_i({\bar{x}})=0, i=1,\cdots ,m \end{aligned}$$
(9)
$$\begin{aligned}&\quad g_i({\bar{x}})\le 0,\bar{v}_{i}\ge 0, i=1,\cdots ,m \end{aligned}$$
(10)
$$\begin{aligned}&\quad f_i({\bar{x}})-\bar{\beta }_{i}h_i({\bar{x}})\le 0,\bar{\gamma }_{i}\ge 0, i=1,\cdots ,p \end{aligned}$$
(11)
$$\begin{aligned}&\quad {\bar{w}}\ge 0, ({\bar{w}},{\bar{\gamma }},{\bar{v}})\ne (0,0,0) \end{aligned}$$
(12)

Suppose that \({\bar{w}}=0\). Then, by (7), we have \({\bar{\gamma }}=0\) because \( h_i(x)>0, i=1,\cdots ,p\) for all \(x\in X\). Hence, it follows from (6), (9), (10) and (12) that

$$\begin{aligned}&\sum \limits _{i\in I({\bar{x}})}\bar{v}_{i} \nabla g_i({\bar{x}})=0, \end{aligned}$$
(13)
$$\begin{aligned}&\quad \sum \limits _{i\in I({\bar{x}})}\bar{v}_{i}>0, \bar{v}_{i}\ge 0, i\in I({\bar{x}}), \end{aligned}$$
(14)

where \(I({\bar{x}})=\{i|g_i({\bar{x}})=0, 1\le i\le m \}\). By Slater condition, there exists a point \(x'\) such that

$$\begin{aligned} g_i(x')< 0, i=1,\cdots ,m. \end{aligned}$$
(15)

Since \(g_i(x),i=1,\cdots ,m\) are convex, it follows from (15) that

$$\begin{aligned} \nabla g_i({\bar{x}})^T(x'-{\bar{x}})\le g_i(x')-g_i({\bar{x}})<0, i\in I({\bar{x}}) \end{aligned}$$
(16)

Letting \(d=x'-{\bar{x}}\), from (16) and (14), we have \(\left(\sum \limits _{i\in I({\bar{x}})}\bar{v}_{i} \nabla g_i({\bar{x}})\right) ^Td<0\), which contradicts (13). Thus, we have \({\bar{w}}>0\).

Denoting \(\frac{{\bar{\gamma }}}{{\bar{w}}}\) and \(\frac{{\bar{v}}}{{\bar{w}}}\) by \({\bar{\gamma }}\) and \({\bar{v}}\), respectively, we see that (7) is equivalent to (4) and so (8) is equivalent to (5) because \(\bar{\gamma }_{i}>0, i=1,\cdots ,p\) by (4).

Given \(\gamma ={\bar{\gamma }}\) and \(\beta ={\bar{\beta }}\), (6), (9) and (10) is just the KKT condition for the problem (3). Since the problem (3) is convex programming for parameters \(\gamma >0\) and \(\beta \ge 0\), the KKT condition is also sufficient optimality condition and then \({\bar{x}}\) is the solution of (3) for \(\gamma ={\bar{\gamma }}\) and \(\beta ={\bar{\beta }}\). \(\square \)

Let \(\alpha =(\beta ,\gamma )\) denote parameter vector and let

$$\begin{aligned} \Omega =\{\alpha =(\beta ,\gamma )\in R^{2p}|0\le \beta \le \beta ^u, \gamma ^l\le \gamma \le \gamma ^u\}, \end{aligned}$$

where

$$\begin{aligned}&\beta ^u=(\beta ^u_1,\cdots ,\beta ^u_p), \gamma ^l=(\gamma ^l_1,\cdots ,\gamma ^l_p), \gamma ^u=(\gamma ^u_1,\cdots ,\gamma ^u_p), \\&\quad \beta ^u_i=\underset{x\in X}{max}\frac{f_i(x)}{h_i(x)}, \gamma ^u_i=\underset{x\in X}{max}\frac{1}{h_i(x)},\\&\quad \gamma ^l_i=\underset{x\in X}{min}\frac{1}{h_i(x)}, i=1,\cdots ,p. \end{aligned}$$

and it is obvious that \(\gamma ^l_i>0, i=1,\cdots ,p.\)

Remark 1.

  1. (i)

    Since the feasible set X is bounded by the assumption, the \(\beta ^u, \gamma ^l\) and \(\gamma ^u\) are well defined, and the set \(\Omega \) is bounded.

  2. (ii)

    The \(\beta ^u, \gamma ^l\) and \(\gamma ^u\) involved in the definition of \(\Omega \) are only necessary for theoretical consideration but in solving the problem (1) at all.

Let \(x(\alpha )\) be the solution of the problem (3) for fixed \(\alpha \in \Omega \) and let

$$\begin{aligned}&\psi _i^1(\alpha )\equiv \phi _i(x(\alpha ),\alpha )= -f_i(x(\alpha ))+\beta _i h_i(x(\alpha )), i=1,\cdots ,p, \end{aligned}$$
(17)
$$\begin{aligned}&\quad \psi _i^2(\alpha )\equiv \phi _{p+i}(x(\alpha ),\alpha )= -1+\gamma _i h_i(x(\alpha )), i=1,\cdots , p \end{aligned}$$
(18)

Let

$$\begin{aligned} {\hat{\Omega }}=\{\alpha \in \Omega |\psi _i^1(\alpha )=0, \psi _i^2(\alpha )=0, i=1,\cdots ,p\}. \end{aligned}$$

and consider the following system of nonlinear equations:

$$\begin{aligned}&\psi _i^1(\alpha )=0 , i=1,\cdots ,p, \end{aligned}$$
(19)
$$\begin{aligned}&\quad \psi _i^2(\alpha )= 0, i=1,\cdots ,p \end{aligned}$$
(20)

Corollary 1.

If \({\bar{x}}\) is the solution for the problem (1), then there exists \({\bar{\alpha }}\in \Omega \) such that \({\bar{x}}=x({\bar{\alpha }})\) and \({\bar{\alpha }}\in {\hat{\Omega }}\).

Proof.

The proof follows from Theorem 1 because (5) and (4) implies (19) and (20), respectively. \(\square \)

Corollary 2.

If the problem (1) has a solution \({\bar{x}}\) , then \({\hat{\Omega }}\ne \emptyset \). And if \({\hat{\Omega }}\) consists of only one point \(\alpha ^*\), i.e. \({\hat{\Omega }}=\{\alpha ^*\}\), then \(x(\alpha ^*)\) is the solution of the problem (1).

Proof.

It follows from Corollary 1 that \({\hat{\Omega }}\ne \emptyset \). If \({\hat{\Omega }}=\{\alpha ^*\}\), the \(x(\alpha ^*)\) is the solution for the problem (1) because \({\bar{x}}=x(\alpha ^*)\) by Corollary 1. \(\square \)

It follows from Corollary 2 that the problem (1) is reduced to find the solution \(x(\alpha )\) of the problem (3) satisfying (19) and (20) for parameter vector \(\alpha \). If \({\hat{\Omega }}=\{\alpha ^*\}\), then solving the problem (1) is equivalent to finding the \(\alpha ^*\in {\hat{\Omega }}\).

In the next section, we establish the uniqueness of \({\hat{\Omega }}\) under some assumptions and propose an algorithm to find the unique element of \({\hat{\Omega }}\), i.e. the global optimal solution for the problem (1).

3 Algorithm and its convergence

In this section, first, we consider the system of equations to be satisfied by the parameters corresponding to the optimal solution of the problem (1) in Lemma 2 and Lemma 3. We show the existence and uniqueness of the solution to the system (Theorem 2, Corollary 3) and then construct a Newton-like algorithm ((46) or (59)) to solve the system. Second, we prove the global linear and local superlinear/quadratic convergence of the algorithm (Corollary 4, Theorem 3). Finally, we show that the solution of the convex programming problem (3) corresponding to the optimal parameter vector, which is any accumulation point of the sequence generated by the algorithm, is the solution of the original problem (1) under some assumptions (Theorem 5). Let

$$\begin{aligned} \psi _i(\alpha )=\psi _i^1(\alpha ),\quad \psi _{p+i} (\alpha )=\psi _i^2(\alpha ),i=1,\cdots ,p. \end{aligned}$$
(21)

Then (19) and (20) can be rewritten as

$$\begin{aligned} \psi (\alpha )=0 \end{aligned}$$
(22)

Now, let’s derive Jacobian matrix \(\psi '(\alpha )\) to construct a Newton-like algorithm. If \(x(\alpha )\) is differentiable, we have the followings by (17) and (18):

$$\begin{aligned} \frac{\partial \psi _i(\alpha )}{\partial \alpha _j}=\nabla _x\phi _i^T \frac{\partial x(\alpha )}{\partial \alpha _j}+\frac{\partial \phi _i(\alpha )}{\partial \alpha _j}, i=1,\cdots ,2p, j=1,\cdots ,2p \end{aligned}$$
(23)
$$\begin{aligned} \frac{\partial \phi _i(\alpha )}{\partial \alpha _j}= {\left\{ \begin{array}{ll} h_i(x(\alpha ))&{}\hbox { if}\ i=j,\\ 0&{} \hbox { if}\ i\ne j, \end{array}\right. }\end{aligned}$$
(24)
$$\begin{aligned} \frac{\partial \phi _{p+i}(\alpha )}{\partial \alpha _j}= {\left\{ \begin{array}{ll} h_i(x(\alpha ))&{}\hbox { if}\ p+i=j,\\ 0&{} \hbox { if}\ p+i\ne j \end{array}\right. } \end{aligned}$$
(25)

for \(i=1,\cdots ,p, j=1,\cdots ,2p\). Therefore, defining A and B as

$$\begin{aligned}&A=\left(\nabla _x\phi _i^T\frac{\partial x(\alpha )}{\partial \alpha _j}\right) _{i,j=1,\cdots ,2p},\nonumber \\&\quad B=\left(\begin{array}{cccccccc} diag(h_i(x(\alpha ))_{i=1,\ldots ,p} &{} 0 \\ 0 &{} diag(h_i(x(\alpha ))_{i=1,\ldots ,p} \\ \end{array} \right) , \end{aligned}$$
(26)

we have

$$\begin{aligned} \psi '(\alpha )= A+B. \end{aligned}$$
(27)

To compute A, we need to find derivatives of \(x(\alpha )\). Since \(x(\alpha )\) is the optimal solution of the problem (3) for given \(\alpha =(\beta ,\gamma )\), there exists a Lagrange multiplier vector \(v(\alpha )\ge 0\) such that \(x(\alpha )\) and \(v(\alpha )\) satisfy the following equations:

$$\begin{aligned}&\sum \limits _{i=1}\limits ^{p}\gamma _i (\nabla f_i(x(\alpha ))-\beta _i\nabla h_i(x(\alpha )))+\ \sum \limits _{i=1}\limits ^{m}v_i(\alpha ) \nabla g_i(x(\alpha ))=0, \\&\quad v_i(\alpha )g_i(x(\alpha ))=0, i=1,\cdots ,m. \end{aligned}$$

letting \(F(x,\alpha )=\sum _{i=1}^{p}\gamma _i(f_i(x)-\beta _ih_i(x))\),\(x\in X\),\(\alpha =(\beta ,\gamma )\), \( \nabla G(x(\alpha ))=(\nabla g_1(x(\alpha )),\cdots ,\nabla g_m(x(\alpha )))\), we have

$$\begin{aligned}&\nabla _x F(x(\alpha ),\alpha )+\nabla G(x(\alpha ))v(\alpha )=0,\end{aligned}$$
(28)
$$\begin{aligned}&\quad v_i(\alpha )g_i(x(\alpha ))=0, i=1,\cdots ,m \end{aligned}$$
(29)

Let

$$\begin{aligned}&I(x)=\{i|g_i(x)=0, 1\le i\le m\}=\{i_1,\cdots ,i_l\}, \\&\quad G^1(x)=(g_{i_1}(x),\cdots ,g_{i_l}(x))^T \text { and } v^1=(v_{i_1},\cdots ,v_{i_l})^T. \end{aligned}$$

Then (28) and (29) can be written by

$$\begin{aligned}&\nabla _x F(x(\alpha ),\alpha )+\nabla G^1(x(\alpha ))v^1(\alpha )=0,\end{aligned}$$
(30)
$$\begin{aligned}&\quad G^1(x(\alpha ))=0. \end{aligned}$$
(31)

By differentiating (30) and (31) with variables \(\alpha _j, j=1,\cdots ,2p\), we have

$$\begin{aligned}&[\nabla _x^2 F+(\nabla ^2 G^1)^T v^1]\frac{\partial x}{\partial \alpha _j}+\nabla G^1\frac{\partial v^1}{\partial \alpha _j}\nonumber \\&\quad =-\frac{\partial ^2 F}{\partial x\partial \alpha _j}-\left(\frac{\partial ^2 G^1}{\partial x\partial \alpha _j}\right) ^T v^1,\end{aligned}$$
(32)
$$\begin{aligned}&\quad (\nabla G^1)^T \frac{\partial x}{\partial \alpha _j}=-\frac{\partial G^1}{\partial \alpha _j}, \end{aligned}$$
(33)

where

$$\begin{aligned} (\nabla ^2 G^1)^T v^1=\sum _{i\in I(x(\alpha ))}v_i^1\nabla _x^2 g_i(x(\alpha )). \end{aligned}$$
(34)

Since \(g_i(x), i=1,\cdots ,m\) don’t involve any parameters, we have

$$\begin{aligned} \frac{\partial G^1}{\partial \alpha _j}=0, \frac{\partial ^2 G^1}{\partial x\partial \alpha _j}=0, j=1,\cdots ,2p. \end{aligned}$$
(35)

Therefore, by denoting

$$\begin{aligned}&y=\left(\begin{array}{cccccccc} x \\ v^1 \\ \end{array}\right) ,\\&\quad \triangle y^j=\left(\begin{array}{cccccccc} \frac{\partial x}{\partial \alpha _j} \\ \frac{\partial v^1}{\partial \alpha _j} \\ \end{array}\right) ,\\&\quad J(\alpha )=\left(\begin{array}{cccccccc} \nabla _x^2F+(\nabla _x^2G^1)^Tv^1 &{} \nabla G^1 \\ (\nabla G^1)^T &{}0 \\ \end{array}\right) ,\\&\quad J_j(\alpha )=-\left(\begin{array}{cccccccc} \frac{\partial ^2F}{\partial x\partial \alpha _j} \\ 0 \\ \end{array}\right) , \end{aligned}$$

we can write (32) as

$$\begin{aligned} J(\alpha )\triangle y^j(\alpha )=J_j(\alpha ), j=1,\cdots ,2p. \end{aligned}$$

Hence, if \(J(\alpha )\) is nonsingular, we have

$$\begin{aligned} \triangle y^j(\alpha )=\left(\begin{array}{cccccccc} \frac{\partial x(\alpha )}{\partial \alpha _j} \\ \frac{\partial v^1(\alpha )}{\partial \alpha _j} \\ \end{array}\right) =J^{-1}(\alpha )J_j(\alpha ), j=1,\cdots ,2p. \end{aligned}$$
(36)

Thus, we have the following result.

Lemma 1.

If \(\nabla _x^2F+(\nabla _x^2G^1)^Tv^1 \) is nonsingular, we have for every \(j=1,\cdots ,2p\)

$$\begin{aligned}&\frac{\partial x(\alpha )}{\partial \alpha _j}=[\nabla _x^2F\nonumber \\&\quad +(\nabla _x^2G^1)^Tv^1]^{-1}\left\{ \nabla G^1M^{-1}(\nabla G^1)^T[\nabla _x^2F+(\nabla _x^2G^1)^Tv^1]^{-1}-I\right\} \nonumber \\&\quad \frac{\partial ^2 F}{\partial x\partial \alpha _j}, \end{aligned}$$
(37)

where \(M=(\nabla G^1)^T[\nabla _x^2F+(\nabla _x^2G^1)^Tv^1]^{-1}\nabla G^1.\)

Proof.

If \(\nabla _x^2F+(\nabla _x^2G^1)^Tv^1 \) is nonsingular,we have

$$\begin{aligned} \frac{\partial x(\alpha )}{\partial \alpha _j}=-[\nabla _x^2F+(\nabla _x^2G^1)^Tv^1]^{-1}\left[ \nabla G^1\frac{\partial v^1}{\partial \alpha _j}+\frac{\partial ^2 F}{\partial x\partial \alpha _j}\right] \end{aligned}$$
(38)

by (32) and (35). By substituting (38) in (33) and taking (35) into account, we have

$$\begin{aligned} (\nabla G^1)^T[\nabla _x^2F+(\nabla _x^2G^1)^Tv^1]^{-1}\left[ \nabla G^1\frac{\partial v^1}{\partial \alpha _j}+\frac{\partial ^2 F}{\partial x\partial \alpha _j}\right] =0. \end{aligned}$$

Therefore, we have

$$\begin{aligned} \frac{\partial v^1}{\partial \alpha _j}=-M^{-1}(\nabla G^1)^T[\nabla _x^2F+(\nabla _x^2G^1)^Tv^1]^{-1}\frac{\partial ^2 F}{\partial x\partial \alpha _j}. \end{aligned}$$
(39)

By substituting (39) in (38), we have (37). \(\square \)

Remark 2.

Since \(\gamma >0\) and \(\beta \ge 0\) for every \(\alpha =(\beta ,\gamma )\in \Omega \), if there exists a \(k\in \{1,\cdots ,p\}\) such that \(f_k(x)\) is strictly convex, then \(\nabla _x^2 F\) is positive definite and \(\nabla ^2g_i(x), i\in I(x)\) are positive semidefinite because of the convexity of \(f_i(x), -h_i(x), i=1,\cdots ,p\) and \(g_i(x), i=1,\cdots ,m\). It follows from (34)that \(\nabla _x^2F+(\nabla _x^2G^1)^Tv^1\) is also positive definite, which shows that (38) is well-defined.

In the above, we exploited the differentiability of \(x(\alpha )\) and \(v(\alpha )\). Now, for the differentiability of \(x(\alpha )\) and \(v(\alpha )\), it needs the following assumption.

(H1) In the problem (3) for each \(\alpha \in \Omega \), LICQ(linear independence constraint qualification), second-order sufficiency condition and strict complementarity condition are satisfied at \(x(\alpha )\)

Remark 3.

The assumption (H1) is often required when applying the classical implicit function theorem to obtain differentiability of the optimal solution in parametric optimization problem. Under the assumption (H1), we have the existence and the uniqueness of differentiable \(x(\alpha )\) and \(v(\alpha )\) satisfying (28) and (29) and have the non-singularity of the matrix \(J(\alpha )\) (Theorem 2.1 of [7]). Therefore, it follows that the matrix M defined in Lemma 1 is nonsingular.

For the sake of convenient analysis for a Newton-like algorithm to be presented below, the following assumption is needed.

(H2)\(\nabla G^1\) is nonsingular for each \(\alpha \in \Omega \), i.e. \(|I(x(\alpha ))|=n\) for each \(\alpha \in \Omega \).

Remark 4.

The assumption (H2) implies that \(\frac{\partial x}{\partial \alpha _j}=0, j=1,\cdots ,2p\) from (33) and (35). Therefore, it follows from (26) and (27) that \(\psi '(\alpha )=B\). As we will show below, the assumption (H1) and (H2) is a sufficient condition under which the system \(\psi (\alpha )=0\) has a unique solution and Newton method for the system always finds the global minimizer of the initial problem. It may be possible to apply the Newton method for the problems which don’t satisfy the assumption (H1)and (H2).

Lemma 2.

Under the assumption (H1)and (H2), \(\psi (\alpha )\) is strongly monotone with constant \(\delta >0\) in \(\Omega \) , where

$$\begin{aligned} \delta =\underset{i}{min}\delta _i, \delta _i=\underset{x\in X}{min}h_i(x),i=1,\cdots ,p \end{aligned}$$

Proof.

By (26) and Remark 4, the Jacobian matrix of \(\psi (\alpha )\) is as follows.

$$\begin{aligned} \psi '(\alpha )= \left(\begin{array}{cccccccc} h_1(x(\alpha )) &{} 0 &{} \dots &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 \\ 0 &{} h_2(x(\alpha )) &{} \dots &{} 0 &{} 0 &{} 0 &{} \dots &{} 0 \\ \vdots &{} \vdots &{} \dots &{} \vdots &{} \vdots &{} \vdots &{} \dots &{} \vdots \\ 0 &{} 0 &{} \dots &{} h_N(x(\alpha )) &{} 0 &{} 0 &{} \dots &{} 0 \\ 0 &{} 0 &{} \dots &{} 0 &{} h_1(x(\alpha )) &{} 0 &{} \dots &{} 0 \\ 0 &{} 0 &{} \dots &{} 0 &{} 0 &{} h_2(x(\alpha )) &{} \dots &{} 0 \\ \vdots &{} \vdots &{} \dots &{} \vdots &{} \vdots &{} \vdots &{} \dots &{} \vdots \\ 0 &{} 0 &{} \dots &{} 0 &{} 0 &{} 0 &{} \dots &{} h_N(x(\alpha )) \\ \end{array} \right) \end{aligned}$$
(40)

Then \(\psi '(\alpha )\) is positive definite because \(x(\alpha )\in X\) and \(h_i(x(\alpha )) >0,i=1,\cdots ,p\). Therefore, for any \(z\in R^{2p}\), we have

$$\begin{aligned} z^T\psi '(\alpha )z=\sum \limits _{i=1}\limits ^{p}z_i^2h_i(x(\alpha ))+\sum \limits _{i=1}\limits ^{p}z_{i+p}^2h_i(x(\alpha )) \\ =\sum \limits _{i=1}\limits ^{p}(z_i^2+z_{i+p}^2)h_i(x(\alpha ))\ge \sum \limits _{i=1}\limits ^{p}(z_i^2+z_{i+p}^2)\delta _i \\ \ge \delta \sum \limits _{i=1}\limits ^{2p}z_i^2=\delta \Vert z\Vert ^2, \end{aligned}$$

which completes the proof. \(\square \)

Let

$$\begin{aligned}&A_i(\alpha )=\frac{f_i(x(\alpha ))}{h_i(x(\alpha ))},\quad A_{p+i}(\alpha )=\frac{1}{h_i(x(\alpha ))},\\&\quad i=1,\cdots ,p,A(\alpha )=(A_1(\alpha )),\\&\quad \cdots ,A_p(\alpha ),A_{p+1}(\alpha ),\cdots ,A_{2p}(\alpha ) \end{aligned}$$

Lemma 3.

The equation \(\psi (\alpha )=0\) is equivalent to the equation \(\alpha =A(\alpha )\). If the problem (1) has an optimal solution, the equation (22) has at least one solution in \(\Omega \).

Proof.

The first proposition is obvious from the definition of \(A(\alpha )\) and \(\psi (\alpha )\). It follows from Corollary 1 that there is \(\alpha \in \Omega \) such that \(\psi (\alpha )=0\). \(\square \)

Let’s introduce a mapping \(B_\lambda (\alpha )=\pi _\Omega (\alpha -\lambda \psi (\alpha ))\) to establish the existence and uniqueness of the solution of (22), where \(\lambda \) is a positive constant and \(\pi _\Omega (a)\) denotes the projection of a onto \(\Omega \), i.e. \(\pi _\Omega (a)\) is the solution of the following problem:

$$\begin{aligned} \min \Vert x-a\Vert ^2, \quad \text {s.t.} \quad x\in \Omega \end{aligned}$$

Lemma 4.

For all \(a\in R^p\) and for all \(x\in \Omega \),

$$\begin{aligned} (\pi _\Omega (a)-a)^T(x-\pi _\Omega (a))\ge 0,\end{aligned}$$
(41)
$$\begin{aligned} \text {and for all} a, b\in R^p \Vert \pi _\Omega (a)-\pi _\Omega (b)\Vert \le \Vert a-b\Vert \end{aligned}$$
(42)

Proof.

See Kinderlehrer and Stampcchia [13]. \(\square \)

Lemma 5.

Assume that \(\psi (\alpha )\) is differentiable, and satisfies (40) and Lipschitz condition with the constant M in \(\Omega \). Then \(B_\lambda :\Omega \rightarrow \Omega \) is a contractive mapping for all \(\lambda \in (0,2\delta /M^2)\).

Proof.

Since \(\psi (\alpha )\) is strongly monotone with the constant \(\delta \) by Lemma 2,

$$\begin{aligned} (\psi (\alpha ')-\psi (\alpha ))^T(\alpha '-\alpha )\ge \delta \Vert \alpha '-\alpha \Vert ^2. \end{aligned}$$
(43)

By (42), (43) and the Lipschitz condition, we have

$$\begin{aligned}&\Vert B_\lambda (\alpha ')-B_\lambda (\alpha )\Vert ^2=\Vert \pi _\Omega (\alpha '-\lambda \psi (\alpha '))\\&\quad -\pi _\Omega (\alpha -\lambda \psi (\alpha ))\Vert ^2\\&\quad \le \Vert \alpha '-\lambda \psi (\alpha ')-(\alpha -\lambda \psi (\alpha ))\Vert ^2=\\&\quad \Vert (\alpha '-\alpha )-\lambda (\psi (\alpha ')-\psi (\alpha ))\Vert ^2\\&\quad =\Vert \alpha '-\alpha \Vert ^2-2\lambda (\alpha '-\alpha )^T(\psi (\alpha ')-\psi (\alpha ))\\&\quad +\lambda ^2\Vert \psi (\alpha ')-\psi (\alpha )\Vert ^2\\&\quad \le \Vert \alpha '-\alpha \Vert ^2-2\lambda \delta \Vert \alpha '-\alpha \Vert ^2\\&\quad +\lambda ^2\Vert \psi (\alpha ')-\psi (\alpha )\Vert ^2\\&\quad \le (1-2\lambda \delta +\lambda ^2M^2)\Vert \alpha '-\alpha \Vert ^2, \end{aligned}$$

which implies that

$$\begin{aligned} \Vert B_\lambda (\alpha ')-B_\lambda (\alpha )\Vert \le q\Vert \alpha '-\alpha \Vert \end{aligned}$$

for all \(\lambda \in (0,2\delta /M^2)\), where \(q=\sqrt{1-2\lambda \delta +\lambda ^2M^2}< 1 \). \(\square \)

Theorem 2.

Assume that the problem (1) has an optimal solution. Suppose that \(\psi (\alpha )\) is differentiable, and satisfies (40) and the Lipschitz condition in \(\Omega \). The equation \(\alpha =B_\lambda (\alpha )\) for all \(\lambda \in (0,2\delta /M^2)\) is equivalent to the equation \(\psi (\alpha )=0\) and the equation (22) has a unique solution.

Proof.

By Lemma 5 and the contractive mapping principle, \(B_\lambda (\alpha )\) has only one fixed point \(\alpha ^*\) in \(\Omega \) for all \(\lambda \in (0,2\delta /M^2)\), i.e. \(B_\lambda (\alpha ^*)=\alpha ^*\). From (41), we have

$$\begin{aligned}&[\pi _\Omega (\alpha ^*-\lambda \psi (\alpha ^*))-(\alpha ^*-\lambda \psi (\alpha ^*))]^T [\alpha -\pi _\Omega (\alpha ^*-\lambda \psi (\alpha ^*)) =\\&\quad =[\alpha ^*-(\alpha ^*-\lambda \psi (\alpha ^*))]^T(\alpha -\alpha ^*)\ge 0 \end{aligned}$$

for \(\alpha \in \Omega ,\) i.e. for every \(\alpha \in \)

$$\begin{aligned} \Omega , \text { we have} \psi (\alpha ^*)^T(\alpha -\alpha ^*)\ge 0. \end{aligned}$$
(44)

Since the problem (1) has an optimal solution, there is a \({\bar{\alpha }}\in \Omega \) such that \(\psi ({\bar{\alpha }})=0\) by Lemma 3. Then, it follows from (44) that

$$\begin{aligned}{}[\psi (\alpha ^*)-\psi ({\bar{\alpha }})]^T({\bar{\alpha }}-\alpha ^*)\ge 0. \end{aligned}$$
(45)

By Lemma 2, \(\psi (\alpha )\) is strongly monotone with the constant \(\delta \) and so we have

$$\begin{aligned}{}[\psi (\alpha ^*)-\psi ({\bar{\alpha }})]^T(\alpha ^*-{\bar{\alpha }})\ge \delta \Vert \alpha ^*-{\bar{\alpha }}\Vert ^2. \end{aligned}$$

This inequality and (45) together implies \(0\le [\psi (\alpha ^*)-\psi ({\bar{\alpha }})]^T({\bar{\alpha }}-\alpha ^*)\le -\delta \Vert \alpha ^*-{\bar{\alpha }}\Vert ^2\). Therefore, \(\Vert \alpha ^*-{\bar{\alpha }}\Vert =0\), i.e. \(\alpha ^*={\bar{\alpha }}\) . Thus , \(\psi (\alpha ^*)=0\) , which means that \(\alpha ^*\) is a solution of (22).

By the definition of \(B_\lambda (\alpha )\), it is obvious that \(\alpha \in \Omega \) with \(\psi (\alpha )=0\) is also the solution of \(\alpha =B_\lambda (\alpha ).\)

Suppose that \(\alpha '\in \Omega \) is a solution of (22) that is different from \(\alpha ^*\). Then, \(\alpha '=B_\lambda (\alpha ')\), which is contradiction because \(\alpha ^*\) is an unique fixed point of \(B_\lambda (\alpha )\) in \(\Omega \). Thus, \(\alpha ^*\) is a unique solution of the equation (22). \(\square \)

Corollary 3.

Suppose that the assumptions of Theorem 2 are satisfied. Then, the equation \(\alpha =A(\alpha )\) has an unique solution which is the unique solution of the equation (22).

Proof.

By Lemma 3, \(\alpha =A(\alpha )\) is equivalent to \(\psi (\alpha )=0\) which is in turn equivalent to \(\alpha =B_\lambda (\alpha )\) by Theorem 2. Therefore, it follows from Theorem 2 that equation \(\alpha =A(\alpha )\) has a unique solution which is the unique solution of the equation (22). \(\square \)

As shown in Lemma 3, \(\alpha \) is a fixed point of \(A(\alpha )\) if and only if \(\alpha \) is a root of \(\psi (\alpha )\) . We can construct the following fixed-point iteration to find a fixed point of \(A(\alpha )\) .

$$\begin{aligned}&\beta ^{k+1}=\left(\frac{f_1(x^k)}{h_1(x^k)},\cdots ,\frac{f_p(x^k)}{h_p(x^k)}\right) ^{T},\nonumber \\&\quad \gamma ^{k+1}=\left(\frac{1}{h_1(x^k)},\cdots , \frac{1}{h_p(x^k)}\right) ^{T} , \end{aligned}$$
(46)

where \(x^{k}=x(\alpha ^{k})\).

Corollary 4.

Suppose that the assumptions of Theorem 2 are satisfied. Then the algorithm (46) is just the Newton method to solve the equation (22) and its local rate of convergence is superlinear or quadratic.

Proof.

The Newton method for the equation (22) is as following.

$$\begin{aligned} \alpha ^{k+1}=\alpha ^k-[\psi '(\alpha ^k)]^{-1}\psi (\alpha ^k) \end{aligned}$$
(47)

By (40), (17) and (18), the right-hand side of (47) is equal to \(A(\alpha ^k)\), i.e. the right-hand side of (46). Therefore, (47) means \(\alpha ^{k+1}=A(\alpha ^k)\) , that is, the fixed-point iteration method to find a fixed point of \(A(\alpha )\) is just the Newton method for solving the equation (22). Hence, the algorithm (46) has local superlinear or quadratic convergence rate. \(\square \)

Theorem 3.

Suppose that the assumptions of Theorem 2 are satisfied and that there exists \(L>0\) such that for every \(\alpha , \alpha '\in \Omega \)

$$\begin{aligned}&\Vert \psi '(\alpha )-\psi '(\alpha ')\Vert \le L\Vert \alpha -\alpha '\Vert , \end{aligned}$$
(48)

and that there exists \({\tilde{M}}>0\) such that for every \(\alpha \in \Omega \)

$$\begin{aligned} \Vert [\psi '(\alpha )]^{-1}\Vert \le {\tilde{M}}. \end{aligned}$$
(49)

Let

$$\begin{aligned}&\alpha ^{k+1}=\alpha ^k+\lambda _k d^k,\qquad d^k=-[\psi '(\alpha ^k)]^{-1}\psi (\alpha ^k), \end{aligned}$$
(50)

where \(\lambda _k\) is the greatest \( \xi ^i \) satisfying

$$\begin{aligned} \Vert \psi (\alpha ^k\nonumber \\&\quad +\xi ^i d^k)\Vert \le (1-\varepsilon \xi ^i)\Vert \psi (\alpha ^k)\Vert \end{aligned}$$
(51)

and \(i\in \{0,1,2,\cdots \},\quad \xi \in (0,1),\quad \varepsilon \in (0,1).\) Then, the modified Newton method defined by (50) and (51) converges to the unique solution \(\alpha ^*\) of \(\psi (\alpha )\) with linear rate for any starting point \(\alpha ^0\in \Omega \) and the rate in the neighborhood of the solution is quadratic.

Proof.

We have already shown the existence and uniqueness of the solution \(\alpha \in \Omega \) for the equation \(\psi (\alpha )=0\). If there is k such that \(\psi (\alpha ^k)=0\), then \(\alpha ^k\) is a solution. So, it is assumed that \(\psi (\alpha ^k)\ne 0\) for every k . For \(\lambda \in [0,1]\) , we have the following by the Newton-Leibniz formula and (48).

$$\begin{aligned} \begin{aligned}&\Vert \psi (\alpha ^k+\lambda d^k)\Vert = \left\| \psi (\alpha ^k)+\lambda \int _0^1\psi '(\alpha ^k+\theta \lambda d^k)d^k d\theta \right\| \\&\quad =\left\| \psi (\alpha ^k)+\lambda \int _0^1\left[ \psi '(\alpha ^k+\theta \lambda d^k)-\psi '(\alpha ^k)\right] d^k d\theta -\lambda \psi (\alpha ^k)\right\| \\&\quad \le (1-\lambda )\Vert \psi (\alpha ^k)\Vert +\lambda ^2L\Vert d^k \Vert ^2, \end{aligned} \end{aligned}$$
(52)

where

$$\begin{aligned} \Vert \psi '(\alpha ^k+\theta \lambda d^k)-\psi '(\alpha ^k)\Vert \le L\theta \lambda \Vert d^k\Vert \end{aligned}$$

by the Lipschitz condition, and \(\psi '(\alpha ^k)d^k=-\psi (\alpha ^k)\) by (50) . In view of (49), it follows from (50) and (52) that

$$\begin{aligned} \Vert \psi (\alpha ^k+\lambda d^k)\Vert \le \left[ 1-\lambda (1-\lambda L{\tilde{M}}^2)\Vert \psi (\alpha ^k)\Vert \right] \Vert \psi (\alpha ^k)\Vert . \end{aligned}$$

Letting \({\bar{\lambda }}_k =\frac{1-\varepsilon }{L{\tilde{M}}^2\Vert \psi (\alpha ^k)\Vert }\), we have

$$\begin{aligned} \Vert \psi (\alpha ^k+\lambda d^k)\Vert \le (1-\varepsilon \lambda )\Vert \psi (\alpha ^k)\Vert \end{aligned}$$
(53)

for every \(\lambda \in (0,\min \{1,\bar{\lambda }_{k}\})\) . Then, (51), the definition of \(\lambda _k\), implies that

$$\begin{aligned} \lambda _k\ge \min \{1,\xi \bar{\lambda }_{k}\}. \end{aligned}$$
(54)

Since it follows from (53) that

$$\begin{aligned} \Vert \psi (\alpha ^{k+1})\Vert \le (1-\varepsilon {{\lambda }_k})\Vert \psi (\alpha ^k)\Vert , \end{aligned}$$
(55)

\(\{\Vert \psi (\alpha ^k)\Vert \}\) is a monotonically decreasing sequence and so \(\{\bar{\lambda }_{k}\}\) increases monotonically. From (54), it follows that

$$\begin{aligned} 1-\varepsilon \lambda _k\le 1-\varepsilon \min \{1,\xi \bar{\lambda }_{k}\}\le 1-\varepsilon \min \{1,\xi \bar{\lambda }_{0}\}. \end{aligned}$$

Letting \(q=1-\varepsilon \min \{1,\xi \bar{\lambda }_{0}\}\) , then \(q<1\) and by (55) we have

$$\begin{aligned} \Vert \psi (\alpha ^{k})\Vert \le q^k\Vert \psi (\alpha ^0)\Vert . \end{aligned}$$
(56)

Therefore, \(\{ \Vert \psi (\alpha ^{k})\Vert \}\) converges to zero with linear rate, which means that \(\{\alpha ^k\}\) converges to a solution of \(\psi (\alpha )=0\) for any starting point \(\alpha ^0\in \Omega \). Since \( \Vert \psi (\alpha ^{k})\Vert \rightarrow 0\) as \(k\rightarrow \infty \), we have \({\bar{\lambda }}_k\rightarrow \infty \) as \(k\rightarrow \infty \) and so, by (54), there exists \(k_0\) such that \(\lambda _k=1\) for every \(k\ge k_0\). Thus, in this case, (50) becomes the Newton method and the rate of convergence in the neighborhood of the solution is quadratic by the Lipschitz property (48). \(\square \)

Let us consider sufficient condition under which the assumptions of Theorem 3 are satisfied.

Theorem 4.

Suppose that \(\psi (\alpha )\) is differentiable and satisfies (40) in \(\Omega \). If \(h_i(x), i=1,\cdots ,p\) and \(x(\alpha )\) are Lipschitz continuous in the feasible set X and \(\Omega \), respectively, then \(\psi '(\alpha )\) is Lipschitz continuous and \([\psi '(\alpha )]^{-1}\) is bounded in \(\Omega \).

Proof.

By the Lipschitz continuity of \(h_i(x)\) and \(x(\alpha )\), there is a \(L_i>0\) and \(C>0\) such that, for all \(\alpha , \alpha '\in \Omega \),

$$\begin{aligned}&|h_i(x(\alpha ))-h_i(x(\alpha ')) | \le L_i |x(\alpha )-x(\alpha ') |, \end{aligned}$$
(57)
$$\begin{aligned}&\quad |x(\alpha )-x(\alpha ') |\le C |\alpha -\alpha ' | \end{aligned}$$
(58)

Since, by (40), we have

$$\begin{aligned} \psi '(\alpha )-\psi '(\alpha ')=\left[ \begin{array}{ll} diag(h_i(x(\alpha ))-h_i(x(\alpha '))) &{} 0 \\ 0 &{} diag(h_i(x(\alpha ))-h_i(x(\alpha '))) \\ \end{array} \right] , \end{aligned}$$

there exists a constant \(L>0\) by (57) and (58) such that \(|\psi '(\alpha )-\psi '(\alpha ') |\le L |\alpha -\alpha ' |\) , which means the Lipschitz continuity of \(\psi '(\alpha )\) .

It is easy to see

$$\begin{aligned}{}[\psi '(\alpha )]^{-1}=\left[ \begin{array}{ll} diag\left(\frac{1}{h_i(x(\alpha ))}\right) &{} 0 \\ 0 &{} diag\left(\frac{1}{h_i(x(\alpha ))}\right) \\ \end{array} \right] . \end{aligned}$$

Thus, from \(\frac{1}{h_i(x(\alpha ))}\le \gamma _i^u,\quad i=1,\cdots ,p\) and Remark 1, it follows that there exists a \({\tilde{M}}>0\) such that for every \(\alpha \in \Omega ,\quad |[\psi '(\alpha )]^{-1} |\le {\tilde{M}},\) that is, (49) is satisfied. \(\square \)

Under the assumption of Theorem 4, the modified Newton method defined by

$$\begin{aligned} \alpha ^{k+1}=\alpha ^k-\lambda _k [\psi '(\alpha ^k)]^{-1}\psi (\alpha ^k) \end{aligned}$$

can be rewritten component-wise as following.

$$\begin{aligned} \Bigg \{\begin{array}{c} \beta _i^{k+1}=\beta _i^k-\frac{\lambda _k}{h_i(x^k)}[\beta _i^kh_i(x^k)-f_i(x^k)]=(1-\lambda _k)\beta _i^k+\lambda _k\frac{f_i(x^k)}{h_i(x^k)},\quad i=1,\cdots ,p \\ \\ \gamma _i^{k+1}=\gamma _i^k-\frac{\lambda _k}{h_i(x^k)}[\gamma _i^k h_i(x^k)-1]=(1-\lambda _k)\gamma _i^k+\lambda _k\frac{1}{h_i(x^k)},\quad i=1,\cdots ,p \end{array} \end{aligned}$$
(59)

On the basis of the above consideration, we construct an algorithm to find global solution of the problem (1) under the assumptions of Theorem 4 as following.

[Algorithm MN] Step 0. Choose \(\xi \in (0,1),\quad \varepsilon \in (0,1)\) and \(y^0\in X\). Let

$$\begin{aligned}&\beta _i^0=\frac{f_i(y^0)}{h_i(y^0)},\quad \gamma _i^0=\frac{1}{h_i(y^0)},\quad i=1,\cdots ,p,\quad k=0 ,\\&\quad \beta ^k=(\beta _1^k,\cdots ,\beta _p^k),\quad \gamma ^k=(\gamma _1^k,\cdots ,\gamma _p^k),\quad \alpha ^k=(\beta ^k, \gamma ^k). \end{aligned}$$

Step 1. Find a solution \(x^k=x(\alpha ^k)\) of the problem (3) for \(\alpha =\alpha ^k.\)

Step 2. If \(\psi (\alpha ^k)=0\) , then \(x^k\) is a global solution and so stop the algorithm. Otherwise, let \(i_k\) denote the smallest integer among \(i\in \{0,1,2,\cdots \}\) satisfying

$$\begin{aligned} |\psi (\alpha ^k+\xi ^i d^k)|\le (1-\varepsilon \xi ^i) |\psi (\alpha ^k) | \end{aligned}$$

and let \(\lambda _k=\xi ^{i_k}\), where

$$\begin{aligned}&d^k=(d_1^k,\cdots ,d_p^k,d_{p+1}^k,\cdots ,d_{2p}^k),\quad d_i^k=-\frac{\psi _i(\alpha ^k)}{h_i(x^k)}, \\&\quad d_{p+i}^k=-\frac{\psi _{p+i}(\alpha ^k)}{h_i(x^k)},i=1,\cdots ,p . \end{aligned}$$

Compute \( \alpha ^{k+1}=\alpha ^k+\lambda _k d^k\) which is equal to (59).

Step 3. Let \(k=k+1\) and go to step 1. \(\square \)

Remark 5.

Under the assumptions of Theorem 4, the step 2 of Algorithm MN is just the modified Newton method (50), (51). If the stepsize \(\lambda _k=\xi ^{i_k}\) in the step 2 is replaced by \(\lambda _k=1\), the step 2 is just the Newton method (47) or (46), and so we denote Algorithm MN as Algorithm N in this case. As shown in Theorem 3, Algorithm MN has global linear and local quadratic rate of convergence.

Remark 6.

If the assumption (H1) is satisfied, we can construct a Newton-like algorithm similar to Algorithm MN without the assumption (H2). In this case, the Newton direction in the step 2 is determined by \(d^k=-(\psi '(\alpha ^k))^{-1}\psi (\alpha ^k))\) using the formula (26), (27) and (37).

Even without differentiability of \(\psi (\alpha )\), we obtain the following result.

Theorem 5.

Suppose that \(\lambda _k\equiv 1\) (or \( \lambda _k\ge {\bar{\lambda }}>0\)) for every k. Then any accumulation point \(\alpha ^*\) of \(\{\alpha ^k\}\) generated by The algorithm N, i.e. (46) (or The algorithm MN, i.e. (59)) is the solution of the equation (22). Moreover, if the equation (22) has a unique solution, the \(x(\alpha ^*)\) is the solution of the problem (1).

Proof.

As shown in Remark 1, the parameter set \(\Omega \) is bounded. Therefore, the sequence \(\{\alpha ^k\}\) is bounded and it has a convergent subsequence by Bolzano-Weierstrass theorem. Then any accumulation point \(\alpha ^*\) of the subsequence is the solution of the equation (22) by (46) or (59). The Corollary 2 completes the proof. \(\square \)

4 Numerical experiments

Our algorithms N and MN were coded in MATLAB and numerical experiments had been carried out on a Pentium (R) 4 CPU 2 GHz RAM 0.97 GB personal computer using the optimization toolbox of MATLAB R2009a. We set \(\Vert \psi (\alpha ^k)\Vert <1e-6\) as the stopping criterion in numerical experiments. In the tables below, “ iter” is the total number of the convex programming problems solved during the algorithm’s work, and the row “N” and “MN” show the average performance of 100 runs with random initial points in our algorithm N and MN.

Example 1

([ 5 , 18 ]).

$$\begin{aligned}&\max \quad \frac{-x_1^2+3x_1-x_2^2+3x_2+3.5}{x_1+1}\\&\quad +\frac{x_2}{x_1^2-2x_1+x_2^2-8x_2+20}, \\&\quad \text {s.t.}\quad 2x_1+x_2\le 6,\quad 3x_1+x_2\le 8,\\&\quad x_1-x_2\le 1,\quad x_1\ge 1,\quad x_2\ge 1 \end{aligned}$$

Table 1 shows the computation result, which demonstrates the advantage of our algorithm over the previous algorithms based on the branch-and-bound method.

Table 1 Computation result for Example 1

Example 2

([ 11 , 19 , 27 ]).

$$\begin{aligned}&\max \quad \frac{-x_1^2+3x_1-x_2^2+3x_2+3.5}{x_1+1}\\&\quad +\frac{x_2}{x_1^2-2x_1+x_2^2-8x_2+20}, \\&\text {s.t.}\quad 2x_1+x_2\le 6,\quad 3x_1+x_2\le 8,\\&\quad x_1-x_2\le 1,\quad x_1\ge 1,\quad x_2\ge 2. \end{aligned}$$

The optimal solution and optimal value is (1, 2) and 4.0357143 , respectively. The results of our methods and previous methods [11, 19, 27] are shown in Table 2. This shows that our methods are much better than the previous methods.

Table 2 Computation result for Example 2

Example 3

([ 10 , 14 , 21 ]).

$$\begin{aligned}&\max \quad \frac{3x_1+5x_2+3x_3+50}{3x_1+4x_2+5x_3+50}\\&+\frac{3x_1+4x_2+50}{4x_1+3x_2+2x_3+50}\\&\quad +\frac{4x_1+2x_2+4x_3+50}{5x_1+4x_2+3x_3+50}, \\&\quad \text {s.t.}\quad 6x_1+3x_2+3x_3\le 10,\\&\quad 10x_1+3x_2+8x_3\le 10,\quad x_1, x_2, x_3\ge 0. \end{aligned}$$

The test results are shown in the following table.

Table 3 Computation result for Example 3

Example 4

([ 14 , 21 , 22 ]).

$$\begin{aligned}&\max \quad \frac{4x_1+3x_2+3x_3+50}{3x_2+3x_3+50}\\&\quad +\frac{3x_1+4x_3+50}{4x_1+4x_2+5x_3+50}\\&\quad +\frac{x_1+2x_2+5x_3+50}{x_1+5x_2+5x_3+50}\\&\quad +\frac{x_1+2x_2+4x_3+50}{5x_2+4x_3+50}, \\&\quad \text {s.t.}\quad 2x_1+x_2+5x_3\le 10,\\&\quad \quad x_1+6x_2+3x_3\le 10,\\&\quad 5x_1+9x_2+2x_3\le 10, \\&\quad 9x_1+7x_2+3x_3\le 10,\quad x_1, x_2, x_3\ge 0. \end{aligned}$$

The results of numerical experiments are given in Table 4, which shows the advantage of our methods over the previous methods.

Table 4 Computation results for Example 4

Example 5

([ 9 , 16 , 25 ]).

$$\begin{aligned}&\min \quad \frac{x_1^2+x_2^2+2x_1x_3}{x_3^2+5x_1x_2}\\&\quad +\frac{x_1+1}{x_1^2-2x_1+x_2^2-8x_2+20}, \\&\quad \text {s.t.}\quad x_1^2+x_2^2+x_3\le 5,\quad (x_1-2)^2+x_2^2+x_3^2\le 5, \\&\quad \quad \quad 1\le x_1\le 3,\\&\quad 1\le x_2\le 3,\quad 1\le x_3\le 2. \end{aligned}$$

The numerical experiment results are shown in Table 5. It shows that the method MN is much better than previous methods. The algorithm N failed to find any global solution.

Table 5 Computation result for Example 5

Example 6

([ 25 ]).

$$\begin{aligned}&\min \quad \frac{-x_1^2+3x_1-x_2^2+3x_2+3.5}{x_1+1}\\&\quad +\frac{x_2}{x_1^2+2x_1+x_2^2-8x_2+20}, \\&\quad \text {s.t.}\quad 2x_1+x_2\le 6,\quad 3x_1+x_2\le 8,\\&\quad x_1-x_2\le 1,\quad 0.1\le x_1, x_2\le 3 \end{aligned}$$

The numerical experiment results are shown in Table 6, which shows the advantage of our algorithms over the previous methods.

Table 6 Computation result for Example 6

Example 7

([4, 26]).

$$\begin{aligned}&\min \frac{x_1^2-4x_1+2x_2^2-8x_2+3x_3^2-12x_3-56}{x_1^2-2x_1+x_2^2-2x_2+x_3+20}\\&\quad +\frac{2x_1^2-16x_1+x_2^2-8x_2-2}{2x_1+4x_2+6x_3},\\&\text {s.t.}\quad x_1+x_2+x_3\le 10,\quad -x_1-x_2+x_3\le 4,\\&\quad 1\le x_1, x_2, x_3 \end{aligned}$$

The numerical experiment result is shown in Table 7.

Table 7 Computation result for Example 7

Example 8

([ 9 , 16 ]).

$$\begin{aligned}&\min \quad \frac{x_2}{x_1^2-2x_1+x_2^2-8x_2+20}\\&\quad -\frac{x_1^2+2x_2^2-3x_1-3x_2-10}{x_1+1}, \\&\quad \text {s.t.}\quad 2x_1+x_2^2-6x_2\le 0,\\&\quad 3x_1+x_2\le 8,\quad x_1^2-x_2-x_1\le 0, \\&\quad \quad \quad 1\le x_1\le 3,\quad 1\le x_2\le 3. \end{aligned}$$

The numerical experiment results are shown in Table 8. Our results are better than those of previous methods.

Table 8 Computation result for Example 8

Example 9

([ 11 , 18 , 23 ]).

$$\begin{aligned}&\min \quad a_1\frac{-x_1^2+3x_1+2x_2^2+3x_2+3.5}{x_1+1}\\&\quad +a_2\frac{x_2}{x_1^2-2x_1+x_2^2-8x_2+20}, \\&\quad \text {s.t.}\quad 3x_1+x_2\le 8,\quad x_1-x_1^{-1}x_2\le 1,\\&\quad 2x_1x_2^{-1}+x_2\le 6,\quad 1\le x_1, x_2\le 3, \end{aligned}$$

where \(a_1=0.25\) and \(a_2=-1.75.\) The numerical experiment results are shown in Table 9.

Table 9 Computation result for Example 9

Example 10

([ 11 , 20 ]).

$$\begin{aligned}&\min \quad \frac{2x_1+x_2}{x_1+10}+\frac{2}{x_2+10},\quad \text {s.t.}\\&\quad -x_1^2-x_2^2+3\le 0,\quad -x_1^2-x_2^2+8x_2-14\le 0, \\&\quad 3x_1+x_2\le 8,\quad 2x_1+x_2\le 6,\quad x_1-x_2\le 1,\\&\quad 1\le x_1\le 3,\quad 1\le x_2\le 4, \end{aligned}$$

The numerical experiment results are shown in Table 10.

Table 10 Computation result for Example 10

Remark 7.

Example 5,6,8 and 9 do not satisfy the assumption of the problem (1), which is, that numerators are convex and denominators are concave in all ratios.In addition, Example 5,8 and 10 have nonconvex constraints. The solutions in [11] and [20] are nonfeasible. For those problems,the algorithm MN obtained much more improved optimal solutions than the existing methods based on branch-and-bound algorithm. Our algorithm MN always found global solution for any starting point in all the examples 1-10 and the algorithm N also gave the same good results for all the examples but for the example 5 which does not satisfy the assumption about objective and constraint functions. This shows that our algorithms can find more accurate global solutions for more general nonlinear sum of ratios problems.

Example 11.

We carried out numerical experiments for randomly generated minimization problems as following. The numerator and denominator of each term in objective function is

$$\begin{aligned}&f_i(x)=\frac{1}{2} x^T A_{0i}x+q_{0i}^T x \quad \text {and}\\&\quad h_i(x)=c_i^T x,\quad i=1,\cdots ,p, \end{aligned}$$

respectively, where

$$\begin{aligned}&A_{0i}=U_iD_{0i}U_i^T,\quad D_{0i}=diag(d), \quad \\&\quad U_i=Q_1Q_2Q_3,\quad d=i-i\cdot rand(n,1), \\&\quad Q_j=I-2\frac{w_jw_j^T}{\Vert w_j\Vert ^2},\\&\quad j=1,2,3,\quad w_1=-i+rand(n,1),\\&\quad w_2=-2i+2rand(n,1),\\&\quad w_3=-3i+3rand(n,1),\quad c_i=i-i\cdot rand(n,1),\\&\quad q_{0i}=i+i\cdot rand(n,1), \quad i=1,\cdots ,p. \end{aligned}$$

The feasible set is given by

$$\begin{aligned} X=\{x\in R^n|Ax\le b,\quad 1\le x_i\le 5,\quad i=1,\cdots ,n\}, \end{aligned}$$

where

$$\begin{aligned} A=-1+2\cdot rand(5,n),\quad b=2+3\cdot rand(5,1). \end{aligned}$$

Starting with randomly generated point in \([1,5]^n\), we carried out 50 runs of Algorithm N and Algorithm MN for fixed n and p. Both of algorithms N and MN were successful for all the problems. It is shown in Table 11 and Table 12.

Table 11 Computation result of Algorithm N, i.e.(47) or (46) for Example 11
Table 12 Computation result of Algorithm MN, i.e.(50),(51) or (59) for Example 11

As shown in the above tables, the number of iteration of Algorithm N and Algorithm MN was independent of the number of variables and fractional functions, and the run-time was proportional to the number of variables.

Remark 8.

It is worth mentioning that both of \(\Vert \psi (\alpha )\Vert \) and the objective function decreased monotonically at each iteration of our algorithms N and MN in all numerical experiments for Examples 1-11.

5 Discussions

In this paper, we presented a parametric optimization algorithm for the sum-of-ratios problem, which is a non-convex fractional programming problem that occurs in many fields, including computer graphics and management. Our approach is based on transforming the sum-of-ratios problem into parametric convex programming problem and it applied Newton-like algorithm to update parameters. The algorithm has global linear and local superlinear/quadratic rate of convergence. Through numerical experiments, we demonstrated that our method always finds the global optimum with a fast convergence rate. In fact, the experiments showed that the optimal point is usually obtained within the first few iterations and it also showed that the algorithm can find global solution for larger class of sum-of-ratios problem than those considered in this paper. Although there is no theoretical guarantee that our method can always find a global optimal solution, we expect that our approach will solve successfully several linear and nonlinear sum-of-ratios problems in practice.