Introduction

It is well known that SCQP problems arise in a wide variety of scientific and engineering applications including regression analysis, image and signal processing, parameter estimation, filter design, robot control, etc. [5]. In many engineering and military applications, the demand on real-time SCQP solutions is often needed, such as classification in complex electromagnetic environments, recognition in medical diagnostics radar object recognition in strong background clutter [16]. Since the computing time required for solving quadratic optimization problem greatly depends on the dimension and the structure of the problem, the conventional numerical methods [147] are usually less effective in real-time applications. One promising approach for handling these optimization problems with high dimensions and dense structure is to employ an artificial neural network [3146] based on circuit implementation. We consider that it is appropriate to utilize the neural networks for efficiently solving the SCQP problems.

The dynamic system approach is one of the important methods for solving optimization problems, which was first proposed by Pyne in the late 1950s [31]. Artificial recurrent neural networks for solving constrained optimization problems can be considered as a tool to transfer the optimization problems into a specific dynamic system of first-order differential equations. It is expected that for an initial point, the dynamic system will approach its static state (or equilibrium point) which corresponds to the solution of the underlying optimization problem. An important requirement is that the energy function decreases monotonically as the dynamic system approaches an equilibrium point. Because of the dynamic nature and the potential of electronic implementation, neural networks can be implemented physically by designated hardware such as application-specific integrated circuits, where the computational procedure is truly distributed and parallel. Therefore, the neural network approach can solve optimization problems in running time orders of magnitude faster than the most popular optimization algorithms executed on general purpose digital computers. It is of great interest in practice to develop some neural network models which could provide a real-time online solution for the SCQP problem.

Motivated by the above discussions, in this paper a gradient neural network model based on Fischer-Burmeister (FB) function for solving the SCQP is presented. According to the saddle point theorem, the equilibrium point of the proposed neural network is proved to be equivalent to the Karush Kuhn Tucker (KKT) point of the SCQP problem. The existence and uniqueness of an equilibrium point of the proposed neural network are analyzed. By constructing a suitable Lyapunov function, a sufficient condition to ensure globally stable in the sense of Lyapunov for the unique equilibrium point of the neural network is derived.

The rest of the paper is organized as follows. In “Problem formulation” section, an NCP-function is used to reformulate the SCQP problem as an unconstrained minimization problem with objective energy function. In “A gradient neurodynamic model” section, a gradient neural network is constructed to solve the SCQP problem, and the stability properties of the proposed neural network are investigated. Some simulation results are discussed to evaluate the effectiveness of the proposed neural network in “Computer simulations” section. Finally, section “Conclusion” concludes this paper.

Problem Formulation

Consider the following SCQP problem:

$$ \hbox{minimize} \;\; \frac{1}{2}x^{T}Qx+d^{T}x $$
(1)

subject to

$$ Ax-b\leq 0, $$
(2)
$$ Gx-f=0, $$
(3)

where \({Q\in \mathbb{R}^{n\times n}}\) is only a symmetric and positive definite matrix, \({d\in \mathbb{R}^{n}}\), \({A\in \mathbb{R}^{m\times n}}\), \({b\in \mathbb{R}^{m}}\), \({G\in \mathbb{R}^{l\times n}}\), \({f\in \mathbb{R}^{l},}\) \(x \in \mathbb{R}^{n}\) and rank \((A,G)=m+l\; (0\leq m,l<n)\).

For the convenience of later discussions, it is necessary to introduce a few notations, definitions, two lemmas and two theorems. In what follows, \(\|.\|\) denotes the l 2-norm of \(\mathbb{R}^n,\) T denotes the transpose and x = (x 1x 2, … , x n )T. If a differentiable function \({\mathcal{H}}:\mathbb{R}^{n}\rightarrow \mathbb{R}\), then \(\nabla {\mathcal{H}}\in \mathbb{R}^n\) stands for its gradient. For any differentiable mapping \({\mathcal{H}}=({\mathcal{H}}_1,\ldots,{\mathcal{H}}_m)^T:\mathbb{R}^n\rightarrow\mathbb{R}^m,\; \nabla{\mathcal{H}}=[\nabla{\mathcal{H}}_1(x),\ldots,\nabla{\mathcal{H}}_m(x)]\in \mathbb{R}^{n\times m}\), denotes the transposed Jacobian of \({\mathcal{H}}\) at x.

Definition 2.1

Let \(\Upomega\subseteq \mathbb{R}^{n}\) be an open neighborhood of \(\bar{x}\). A continuously differentiable function \(\xi:\mathbb{R}^{n}\rightarrow \mathbb{R}\) is said to be a Lyapunov function at the state \(\bar{x}\) (over the set \(\Upomega\)) for a system \(x^{\prime} = {\mathcal{H}}(x)\), if

$$ \left\{\begin{array}{l} \xi(\bar{x})=0,\; \xi(x)>0,\;\; \forall x\in \Upomega\backslash\{\bar{x}\}, \\ \frac{\hbox{d}\xi(x(t))}{\hbox{d}t}=[\nabla\xi(x(t))]^T{\mathcal{H}}(x(t))\leq 0,\;\; \forall \; x\in \Upomega. \end{array} \right . $$

Lemma 2.2

[24]: (a) An isolated equilibrium point \( x^*\) of a system \({x^{\prime} = {\mathcal{H}}(x)}\) is Lyapunov stable if there exists a Lyapunov function over some neighborhood \(\Upomega^*\) of \( x^*.\)

(b) An isolated equilibrium point \( x^*\) of a system \({x^{\prime} = {\mathcal{H}}(x)}\) is asymptotically stable if there is a Lyapunov function over some neighborhood \(\Upomega^*\) of \( x^*\) such that \(\frac{\hbox{d}\xi(x(t))}{\hbox{d}t}<0 ,\;\;\forall\; x\in \Upomega^*\backslash \{x^*\}\).

Definition 2.3

Let x(t) be a solution trajectory of a system \({x^{\prime} = {\mathcal{H}}(x)}\), and let \( X^*\) denotes the set of equilibrium points of this equation. The solution trajectory of the system is said to be globally convergent to the set \( X^*\), if x(t) satisfies

$$ \lim\limits_{t\rightarrow\infty} \hbox{dist} (x(t),X^*)=0, $$

where dist\((x(t),X^*)=\inf_{y\in X^{*}}\|x-y\|\). In particular, if the set \( X^*\) has only one point \( x^*\), then \(\lim\nolimits_{t\rightarrow\infty} x(t) = x^*\), and the system is said to be globally asymptotically stable at \( x^*\) if the system is also stable at \( x^*\) in the sense of Lyapunov.

Theorem 2.4

[5] \({x\in\mathbb{R}^n}\) is an optimal solution of (1)–(3) if and only if there exist \({u\in\mathbb{R}^m}\) and \({v\in\mathbb{R}^l}\) such that (x Tu Tv T)T satisfies the following KKT system

$$ \left\{ \begin{array}{l} u\geq0,\; \; Ax-b\leq0,\; \; {u}^T(Ax-b)=0,\\ Qx+d+G^Tv+A^Tu=0,\; \; \\ Gx-f=0. \end{array} \right. $$
(4)

\(x^*\) is called a KKT point of (1)–(3) and a pair \( (u^{*T}, v^{*T})^T \) is called the Lagrangian multiplier vector corresponding to \( x^*.\)

Theorem 2.5

[5] If Q is a positive definite matrix, then \( x^*\) is an optimal solution of (1)–(3), if and only if \( x^*\) is a KKT point of (1)–(3).

Lemma 2.6

[32] If A is an n × n nonsingular matrix, then the homogeneous system AX = 0 has only the trivial solution X = 0.

A Gradient Neurodynamic Model

We can establish the relationship between the solution to problem (1)–(3) and the solution to an equivalent unconstrained minimization problem via a merit function [13]. A merit function is a function whose global minimizers coincide with the solutions of the NCP. The class of NCP-functions defined below is used to construct a merit function.

Definition 3.1

• A function \({\phi:\mathbb{R}^2\rightarrow\mathbb{R}}\), is called an NCP-function if it satisfies

$$ \phi(a,b)=0\Longleftrightarrow a\geq 0,\; b\geq 0,\; ab=0. $$

• A popular NCP-function is the FB function, which is strongly semismooth [30, 33] and is defined as

$$ \phi_{\rm FB}(a,b)=(a+b)-\sqrt{a^2+b^2}. $$

The FB merit function \({\psi_{\rm FB}:\mathbb{R}\times\mathbb{R}\rightarrow \mathbb{R}_{+}}\) can be obtained by taking the square of ϕFB, i.e.,

$$ \psi_{\rm FB}(a,b)=\frac{1}{2}\|\phi_{\rm FB}(a,b)\|^2. $$

The perturbed FB function is also given by

$$ \phi_{\rm FB}^{\varepsilon}(a,b)=(a+b)-\sqrt{a^2+b^2+\varepsilon},\;\;\varepsilon\rightarrow 0_{+}. $$

The important property of \(\phi_{\rm FB}^{\varepsilon}\) can be stated in the following result.

Proposition 3.2

[10] For every \({\varepsilon\in \mathbb{R}}\), we have

$$ \phi_{\rm FB}^{\varepsilon}(a,b)=0 \Leftrightarrow a > 0, b > 0, ab = \frac{\varepsilon}{2}. $$

The above proposition is obvious, and the proof is omitted. It is noted that \(\phi_{\rm FB}^{\varepsilon}(a,b)\) is smooth with respect to a, b for \(\varepsilon > 0\).

Lemma 3.3

\( y^*=(x^{*T}, u^{*T}, \nu^{*T})^{T}\) satisfies the following equation

$$ \eta(y)=\left[ \begin{array}{c} Qx+d+G^Tv+A^Tu \\ f-Gx\\ \phi_{\rm FB}^{\varepsilon}\left(u,b-Ax\right)\\ \end{array} \right]=0, $$
(5)

if and only if \( x^*\) is a KKT point of (1)–(3) for every \(\varepsilon \rightarrow 0_{+}\).

Proof

Let η(\( y^*\)) = 0. Then

$$ Qx^*+d+G^Tv^*+A^Tu^*, $$
(6)
$$ f-Gx^*=0, $$
(7)
$$ \phi_{\rm FB}^{\varepsilon}\left(u^*,b-Ax^*\right)=0,\;\; \hbox{for} \; \varepsilon\rightarrow 0_{+}. $$
(8)

From proposition 3.2, it follows easily that \(\phi_{\rm FB}^{\varepsilon}\left(u^*,b-Ax^*\right)=0\) if and only if

$$ u^*\geq0,\;\;\; Ax^*-b\leq0,\;\;\; u{^*{^T}}(Ax^*-b)=0,\;\; \hbox{for} \; \varepsilon\rightarrow 0_{+}. $$
(9)

From (6), (7) and (9), it is seen that \( y^*=(x^{*T}, u^{*T}, \nu^{*T})^{T}\) satisfies the KKT conditions (4).

The converse is straightforward.□

Now by Definition 3.1 and Lemma 3.3, we can easily verify that the KKT system (4) is equivalent to the following unconstrained smooth minimization problem:

$$ \hbox{minimize} \;E(y)=\frac{1}{2}\|\eta(y)\|^2. $$
(10)

It is clear that E(y) is a smooth merit function for the KKT system (4). The merit function E in (10) is continuously differentiable for all \({ y\in\mathbb{R}^{n+l+m}}\).

Now let x(.), u(.) and v(.) be some time dependent variables. Our aim is to design a neural network that will settle down to an equilibrium point, which is also a stationary point of the energy function E(y). Hence, we can use the steepest descent method to construct the following neural network model for solving problem (1)–(3) as

$$ {\frac{\hbox{d}y(t)}{\hbox{d}t}}=-k\nabla E(y(t)),\;\; k > 0, $$
(11)
$$ y(0)=y_0, $$
(12)

where k is a scaling factor and indicates the convergence rate of the neural network (11) and (12). An indication on how the neural network (11) and (12) can be implemented on hardware is provided in Fig. 1.

In order to see how well the presented neural network (11) and (12) can be applied to solve SCQP problems, we compare it with some existing neural network models. First, let us consider the following problem

$$ \hbox{minimize} \;\;\;\; \frac{1}{2}x^{T}A_{0}x+a^{T}x $$
(13)
Fig. 1
figure 1

A simplified block diagram for the neural network (11) and (12)

subject to

$$ x\in \Upomega_0, $$
(14)
$$ Dx=b, $$
(15)

where \(A_{0_{m\times m}}\) is a symmetric positive semidefinite matrix, D in an n × m matrix, \({\Upomega_0\subset\mathbb{R}^m}\) is a closed convex set and x = (x 1x 2, …, x m )T. In [36], the projection neural network for solving (13)–(15) is given by

$$ \frac{\hbox{d}x}{\hbox{d}t}=P_0(x-A_0x+D^Ty-a)-x, $$
(16)
$$ \frac{\hbox{d}u}{\hbox{d}t}=-DP_0(x-A_0x+D^Ty-a)+b, $$
(17)

where P 0 is the projection operator [40] on \(\Upomega_0\). The neural network model (16)–(17) has not been proved to be convergent in finite time. If we also use the network in [39] to solve the problem (13)–(15), the dimension of the network is higher. Furthermore, it has not been proved to be convergent in finite time, too.

There is another kind of the neural network model called gradient model. In order to use the gradient neural network model, a constrained optimization problem can be approximated by an unconstrained optimization problem. Then the energy function is constructed by the penalty function method. It is noticeable that the gradient neural network model has an advantage as the model may be defined directly using the derivatives of the energy function. But its shortcoming is that the convergence is not guaranteed, especially in the case of unbounded solution sets [46]. Moreover, the gradient neural network based on the penalty function requires any adjustable parameter called the penalty parameter. For instance, a gradient neural network model for solving (1)–(3) is given by

$$ \frac{\hbox{d}x}{\hbox{d}t}=-\left(Qx+d+\gamma[A^T(Ax-b)^++G^T(Gx-f)]\right), $$
(18)

where γ is a penalty parameter. The system in (18) is referred to as Kennedy and Chua’s neural network model [17]. This network is not capable to find an exact optimal solution due to a finite penalty parameter and is difficult to implement when the penalty parameter is very large [18]. Thus, this network only converges to an approximate solution of (1)–(3) for any given finite penalty parameter. It can be also shown that the Kennedy–Chua’s neural network (18) is not globally convergent to an exact optimal solution of some quadratic programming problems. For instance, one can see Example 5.1 in this manuscript.

Stability and Convergence Properties

In order to study the stability of the neural network in (11) and (12), we first prove the following lemma.

Lemma 4.1

The Jacobian matrix ∇η(y) of the mapping η defined in (5) is nonsingular.

Proof

With a simple calculation and using Proposition 3.2 in [35] and Proposition 3.3 in [34], it is clearly shown that

$$ \nabla \eta(y)=\left[ \begin{array}{cccc} Q & G^T & A^T \\ -G&{O_{l\times l}}&{O_{l\times m}}\\ ( -\hbox{diag}\{\nabla_{(Ax-b)_{k}}\phi_{\rm FB}^{\varepsilon}(u,(b-Ax)_{k})\}_{k=1}^m)A&{ O_{m\times l}}& \hbox{diag}\{\nabla_{u_{k}}\phi_{\rm FB}^{\varepsilon}(u_{k},(b-Ax)_{k})\}_{k=1}^m \end{array} \right]. $$

Since Q is a positive definite matrix and rank(AG) = m + l, from Theorem 3.1 in [34] we see that ∇η(y) is nonsingular. This completes the proof.□

We now investigate the relationships between an equilibrium point of (11) and (12) and a solution to problem (1)–(3).

Theorem 4.2

Let \( x^*\) be a KKT point of (1)–(3). Then \( y^*\) is an equilibrium point of the neural network (11) and (12). On the other hand, if \( y^*=(x^{*T}, u^{*T}, \nu^{*T})^{T}\) be an equilibrium point of the neural network (11) and (12) and the Jacobian matrix of η(y) in (5) is nonsingular, then \( x^*\) is a KKT point of the problem (1)–(3).

Proof

Suppose that \( x^*\) is an optimal solution for (1)–(3). From Lemma 3.3, it is clear that η(\( y^*\)) = 0. With a simple calculation, it is clearly shown that

$$ \nabla E(y)=(\nabla\eta(y))^T\eta(y), $$
(19)

where ∇η(y) is the Jacobian matrix of η(y). Using (19) we get ∇E(\( y^*\)) = 0, i.e. \( y^*\) is an equilibrium point of dynamic model (11) and (12). From Lemma 4.1 and Lemma 2.6, the converse of the proof is straightforward.□

Lemma 4.3

The equilibrium point of the proposed neural network model (11) and (12) is unique.

Proof

Since problem (1)–(3) has unique optimal solution \( x^*\), the necessary and sufficient KKT conditions (4) has a unique solution \( y^*=(x^{*T}, u^{*T}, \nu^{*T})^{T}\). Moreover, from Theorem 4.2 we see that the solution of the KKT system (4) is the equilibrium point of the proposed neural network model (11) and (12). Thus the equilibrium point of the network (11) and (12) is unique.□

Now we state the main result of this section.

Theorem 4.4

Let \( y^*\) be an isolated equilibrium point of (11) and (12). Then \( y^*\) is asymptotically stable for (11) and (12).

Proof

First, notice that E(y) ≥ 0 and E(\( y^*\)) = 0. In addition, since \( y^*\) is an isolated equilibrium point of (11) and (12), there exists a neighborhood \({\Upomega_{*}\subseteq{\mathbb{R}^{n+m+l}}}\) of \( y^*\) such that

$$ \nabla E(y^*)=0,\;\hbox{and}\; \nabla E(y)\neq 0,\; \forall y\in\Upomega_{*}\backslash \{y^*\}. $$

We claim that for any \(y\in \Upomega_{*}\backslash \{y^*\}\), E(y) > 0. Otherwise if there is a \(y\in \Upomega_*\backslash \{y^*\}\) satisfying E(y) = 0. Then, by (10) and (19) we have ∇ E(y) = 0, i.e., y is also an equilibrium point of (11), which clearly contradicts the assumption that y is an isolated equilibrium point in \(\Upomega_{*}\). Moreover

$$ \frac{\hbox{d}E(y(t))}{\hbox{d}t}=[\nabla E(y(t))]^T { {\hbox{d}y(t )}\over {\hbox{d}t}}= -k\|\nabla E(y(t))\|^2\;\leq \;0, $$
(20)

and

$$ \frac{\hbox{d}E(y(t))}{\hbox{d}t}<0,\;\; \forall y(t)\in \Upomega_*\;\; \hbox{and}\; \; y(t)\neq y^*. $$

This by lemma 2.2 (b) implies that \( y^*\) is asymptotically stable.□

Theorem 4.5

Suppose that y = y(t,y 0) is a trajectory of (11) and (12) in which the initial point is y 0 = y(0,y 0) and the level set \({L(y_0)=\{y\in \mathbb{R}^{n+m+l}:\; E(y)\leq E(y_0)\}}\) is bounded, then

  1. (a)

    γ+(y 0) = {y(t,y 0)|t ≥ 0} is bounded.

  2. (b)

    There exists \(\bar{y}\) such that \(\lim\nolimits_{t\rightarrow\infty}y(t,y_0)=\bar{y}\).

Proof

(a) Suppose \( y^*\)> is an equilibrium point of the network in (11) and (12). Calculating the derivative of E(y) along the trajectory y(ty 0), (t ≥ 0) one has

$$ \frac{\hbox{d}E(y(t))}{\hbox{d}t}=[\nabla E(y(t))]^T { \frac{\hbox{d}y}{\hbox{d}t}}= -k\|\nabla E(y(t))\|^2\leq0. $$
(21)

Thus E(y) is monotone nonincreasing along the trajectory y = y(ty 0), (t ≥ 0). Therefore \(\gamma^+(y_0)\subseteq L(y_0)\), that is to say γ+(y 0) = {y(t,y 0)|t ≥ 0} is bounded.

(b) By (a) γ+(y 0) = {y(t,y 0)|t ≥ 0} is a bounded set of points. Take strictly monotone increasing sequence \(\{\bar{t}_n\}\), \( 0\leq\bar{t}_1\leq\cdots\leq\bar{t}_n\leq\cdots,\; \bar{t}_n\rightarrow+\infty,\) then \(\{y(\bar{t}_n, y_0)\}\) is a bounded sequence composed of infinitely many points. Thus there exists limiting point \(\bar{y}\), that is, there exists a subsequence \(\{t_n\}\subseteq\{\bar{t}_n\}, t_n\rightarrow+\infty\) such that

$$ \lim\limits_{n\rightarrow+\infty}y(t_n,y_0)=\bar{y}, $$

where \(\bar{y}\) satisfies

$$ \frac{\hbox{d}E(y(t))}{\hbox{d}t}=0, $$

which indicates that \(\bar{y}\) is ω-limit point of γ+(y 0). Using the LaSalle Invariant Set Theorem [12], one has that \(y(t,y_0)\rightarrow \bar{y}\in M\) as \(t\rightarrow\infty\), where M is the largest invariant set in \(K=\{y(t,y_0)|\frac{\hbox{d}E(y(t,y_0))}{\hbox{d}t}=0\}\). From (11), (12) and (21), it follows that \(\frac{\hbox{d}x}{\hbox{d}t}=0\), \(\frac{\hbox{d}u}{\hbox{d}t}=0\) and \(\frac{\hbox{d}v}{\hbox{d}t}=0 \Leftrightarrow \frac{\hbox{d}E(y(t))}{\hbox{d}t}=0\); thus \(\bar{y}\in D^*\) by \(M\subseteq K\subseteq D^*\). Therefore, from any initial state y 0, the trajectory y(t, y 0) of (11) and (12) tends to \(\bar{y}\). The proof is complete.□

As an immediate corollary of Theorems 4.4 and 4.5, we can get the following result.

Corollary 4.6

If \( D^*=\{(x^{*T}, u^{*T}, \nu^{*T})^{T}\}\), then the neural network (11) and (12) for solving (1)–(3) is globally asymptotically stable to the unique equilibrium point \( y^*=(x^{*T}, u^{*T}, \nu^{*T})^{T}\), where \( D^*\) is denoted as the optimal point set of (1)–(3).

Computer Simulations

In order to demonstrate the effectiveness of the proposed neural network (11) and (12), in this section we test several examples. For each test problem, we compare the numerical performance of the proposed neural network with various values of k and various initial states y(0). We also provide two applicable examples in engineering as support vector machine learning in regression and constrained least–squares approximation problem. The simulation is conducted on Matlab 7, the ODE solver engaged is ode45s.

Example 5.1

$$ \hbox{minimize} \;\;\; f(x)=x_1^2+x_2^2+x_1x_2-30x_1-30x_2 $$
$$ \hbox{subject to} \left\{ \begin{array}{l} \frac{5}{12}x_1-x_2\leq\frac{35}{12},\\ \frac{5}{2}x_1+x_2\leq\frac{35}{2},\\ -x_1\leq5,\\ x_2\leq5. \end{array} \right. $$

The problem and its dual have a unique solution \((x^{*T},u^{*T})^{T}=(5,5,0,6,0,9)^{T}\). Theorems 4.4 and 4.5 and corollary 4.6 guarantee that the stated model in (11) and (12) converges globally to \(x^{*}\) = (5, 5)T. Figure 2 depicts the phase diagram of state variables (x 1(t), x 2(t))T, with 5 different initial points and scaling factor k = 10, where S stands for the feasible region. These vectors converge to its exact solution \(x^{*}\) globally.

Fig. 2
figure 2

Transient behavior of the neural network (11) and (12) with five different initial points and k = 10 in Example 5.1

We test the influence of the parameter k on the value of \(\|y(t)-y^*\|^2\). From Fig. 3, we see that when k = 0.1, the neural network (11) and (12) generates the slowest decrease of \(\|y(t)-y^*\|^2\), whereas when k = 10, it generates the fastest decrease of \(\|y(t)-y^*\|^2\) with the initial state y 0 = (2,  −2, 2,  −2, 2,  −2)T. Figure 4 also describes the convergence behavior of \(\|y(t)-y^*\|^2\) with 10 various initial states and k = 5.

Fig. 3
figure 3

Convergence behavior of \(\|y(t)-y^*\|^2\) with y 0 = (2,  −2, 2,  −2, 2,  −2)T in Example 5.1

Fig. 4
figure 4

Convergence behavior of \(\|y(t)-y^*\|^2\) with 10 different initial points and k = 5 in Example 5.1

To make a comparison, the above problem is solved using the Kennedy and Chua\(^{\prime}\)s neural network in (18) with γ = 50 in [22]. It is seen that this network converges to its equilibrium point \(\tilde{x} =(4.97778,5.1745)^T\), which can be viewed as the approximate solution of the above problem. It is clear that this equilibrium point is not feasible. Thus, we can conclude that the proposed network (11) and (12) is feasible and has a good stability performance.

Example 5.2

$$ \hbox{minimize}\; f(x)=x_1^2+x_2^2+x_3^2 $$
$$ \hbox{subject to}\left\{ \begin{array}{l} 2x_1+x_2-5\leq0,\\ x_1+x_3-2\leq 0,\\ -x_1+1\leq 0,\\ -x_2+2\leq0,\\ -x_3\leq0. \end{array} \right. $$

The optimal solution is \(y^{*}\) = (1, 2, 0, 0, 0, 2, 4, 0)T. Figure 5 shows the local convergence behavior of the error \(\|y(t)-y^*\|^2\) with different k and the initial point y 0 = (1,  −1, 1,  −1, 1,  −1, 1,  −1)T. It is clear that a larger k yields a better convergence rate of the error \(\|y(t)-y^*\|^2\). Figure 6 shows that the trajectories of the proposed neural network in (11) and (12) to solve the problem with 6 random initial points and k = 5 converge to the optimal solution of this problem. It is easy to verify that whether or not an initial point is taken inside or outside the feasible region, the proposed network always converges to the theoretical optimal solution \(x^{*}\) = (1, 2, 0)T.

Fig. 5
figure 5

Convergence behavior of \(\|y(t)-y^*\|^2\) with y 0 = (1,  −1, 1,  −1, 1,  −1, 1,  −1)T in Example 5.2

Fig. 6
figure 6

Transient behavior of x(t) with six different initial points and k = 5 in Example 5.2

Example 5.3

$$ \hbox{minimize}\; f(x)=x^2_1+x^2_2+x^2_3-2x_1-3x_4 $$
$$ \hbox{subject to}\left\{ \begin{array}{l} -x_1\leq0,\\ -x_2\leq0,\\ -x_3\leq0,\\ -x_4\leq0,\\ 2x_1+x_2+x_3+4x_4=7,\\ x_1+x_2+2x_3+x_4=6. \end{array} \right. $$

The optimal solution for this example is \(x^{*}\) = (1.1233, 0.6507, 1.8288, 0.5685)T, \(u^{*}\) = (0, 0, 0, 0)T and \( v^*\) = (1.0548, −2.3562)T. Figures 7 and 8 display the transient behavior of the proposed network with 10 different initial points and k = 1. All trajectories of the network converge to the optimal solution \(y^{*}=(x^{*T}, u^{*T}-\nu^{*T})^{T}\). Moreover, when the initial point is chosen as an infeasible point, the trajectory of the network still converges to \(y^{*}\). Figure 9 shows how the value of \(\|y(t)-y^*\|^2\) with 10 various initial states and k = 1.

Fig. 7
figure 7

Transient behavior x(t) of the proposed neural network with 10 different initial points and k = 1 in Example 5.3

Fig. 8
figure 8

Transient behavior (u(t)Tv(t)T)T of the proposed neural network with 10 different initial points and k = 1 in Example 5.3

Fig. 9
figure 9

Convergence behavior of \(\|y(t)-y^*\|^2\) with 10 different initial points and k = 1 in Example 5.3

Example 5.4

Consider the regression problem of approximating a set of data

$$ \{(x_1,y_1),(x_2,y_2),\ldots,(x_N,y_N)\}, $$
(22)

with a regression function as

$$ \phi(x)=\sum\limits_{i=1}^N \alpha_i\Upphi_i(x)+\beta, $$
(23)

where \(\Upphi_i(x)(i=1,\ldots,N)\) are called feature functions defined in a high-dimensional space and α i (i = 1, …, N) and β are parameters of the model to be estimated. Here, we can use the recurrent neural network in (11) and (12) to estimate these parameters. By utilizing Huber loss function [41], the regression function defined in (23) can be represented as

$$ \phi(x)=\sum\limits_{i=1}^N \xi_iK(x,x_i)+\beta, $$

where K(xy) is a kernel function satisfying \(K(x_i,x_j)=\Upphi(x_i)^T\Upphi(x_j)\). According to the problem formulation in [20], \(\xi_i \; (i=1,\ldots,N)\) is the optimal solution of the following quadratic programming problem:

$$ \hbox{minimize}\;\frac{1}{2}\sum\limits_{i=1}^N\sum\limits_{j=1}^N\xi_i\xi_jK(x_i,x_j)-\sum\limits_{i=1}^N\xi_iy_i+\frac{\zeta}{2\varrho} \sum\limits_{i=1}^N\xi_i^2 $$
(24)
$$ \hbox{subject to } \left\{ \begin{array}{ll} \sum\nolimits_{i=1}^N\xi_i=0,\\ -\varrho\leq \xi_i\leq \varrho,\;\;i=1,\ldots,N, \end{array} \right. $$
(25)

where \(\zeta>0\) is an accuracy parameter required for the approximation and ϱ > 0 is a prespecified parameter. It is also shown [3] that β = \( v^*\), where \( v^*\) is the equilibrium point of the proposed neural network model to solve the above quadratic optimization problem.

As an example, we consider the regression data in Table 1. We use the proposed neural network (11) and (12) with an RBF kernel to train an support vector machine for the regression problem. We choose the following Gaussian function [21]

$$ K(x,y)=\exp\left({-\frac{\|x-y\|^2}{2\delta^2}}\right). $$

Figure 10 depicts the regression results with ϱ = 100, δ = 1 and three different \(\zeta\) parameters. It is clear that when \(\zeta\) tends to zero, the approximation is more accurate. Figure 11 also shows the convergence of the error norm between the output vector and the optimal solution x* with the initial point x 0 = (1,  −1, 1,  −1, 1,  −1, 1,  −1, 1)T. The regression results illustrate the good performance of the proposed neural network.

Fig. 10
figure 10

Results of support vector regression using the proposed neural network (11) and (12) with an RBF kernel where ϱ = 100, δ = 1 and three different \(\zeta\) parameters in Example 5.4

Fig. 11
figure 11

Convergence behavior of \(\|x(t)-x^*\|^2\) with x 0 = (1,  −1, 1,  −1, 1,  −1, 1,  −1, 1)T and k = 10 in Example 5.4

Example 5.5

Let us consider a constrained least-squares approximation problem [19]: Find the parameters of the combination of exponential and polynomial functions y(x) = a 4 e x + a 3 x 3 + a 2 x 2 + a 1 x + a 0, which fit the data given in Table 2 and subjects to the constraints 8.1 ≤ y(1.3) ≤ 8.3, 3.4 ≤ y(2.8) ≤ 3.5 and 2.25 ≤ y(4.2) ≤ 2.26. This problem can be formulated as follows:

$$ \hbox{minimize} \; \|Cx-d\|^2 $$

subject to

$$ Ax\in \Upomega, $$

where x = (x 1x 2x 3x 4x 5)T = (a 4a 3a 2a 1a 0)T,

$$ C=\left[ \begin{array}{cccccccccc} 1& 1.649& 2.718& 4.482& 7.389& 12.183& 20.086& 33.116& 54.598& 90.017\\ 0& 0.125& 1 & 3.375& 8& 15.625& 27& 42.875& 64 & 91.125\\ 0& 0.25& 1 & 2.25& 4& 6.25 & 9& 12.25 & 16 & 20.25\\ 0& 0.5& 1 & 1.5& 2& 2.5 & 3& 3.5 & 4 & 4.5\\ 1& 1& 1 & 1& 1& 1 & 1 & 1 &1 &1 \end{array} \right]^T, $$
$$ d=\left[ \begin{array}{cccccccccc} 7.6& 7.2& 7.9& 8& 6.2& 6.2& 3& 0.8& 1.2& 5.8\\ \end{array} \right]^T, $$
$$ A=\left[ \begin{array}{ccccc} 3.669& 2.197& 1.69& 1.3& 1\\ 16.445& 21.952& 7.84& 2.8& 1\\ 66.686& 74.088& 17.64& 4.2& 1\\ \end{array} \right], $$

and \({\Upomega=\{x\in \mathbb{R}^{3}: 8.1\leq x_1 \leq 8.3, 3.4\leq x_2\leq 3.5, 2.25 \leq x_3\leq 2.26\}}\). Figure 12 depicts the approximation results using the proposed neural network. An l 2 norm error between x and \( x^*\) with the initial point x 0 = (2,  −2, 2,  −2, 2)T is also shown in Fig. 13.

Table 1 Regression data for Example 5.4
Table 2 Approximation data for Example 5.5
Fig. 12
figure 12

Approximation results of the neural network (11) and (12) in Example 5.5, where the circles represent the approximation data and the curve is the approximation function

Fig. 13
figure 13

Convergence behavior of \(\|x(t)-x^*\|^2\) with x 0 = (2,  −2, 2,  −2, 2)T and k = 7 in Example 5.5

Example 5.6

We apply the proposed artificial neural network (11) and (12) to increase the useful information content of images and improve the quality of the noisy images. Consider an array of n sensors. Let I l (k) denotes the received M × N two dimensional gray-level image from the l’th sensor, whose amplitude is denoted by f l (ij). We assume

$$ I_l ((i- 1)N+ j)= f_l (i, j),\quad i= 1,\ldots,M;\; j= 1,\ldots , N. $$

Then, the n- dimensional vector of information received from n sensors is given by

$$ I (k)= a s(k) + \hat{n}(k), $$

where a = [a 1, …, a n ]T , I k  = [I 1 (k), …, I n (k)]T and \(\hat{n}(k)=[\hat{n}_1(k),\ldots,\hat{n}_n(k)]^T, \;a_l\) is a scaling coefficient, s(k) denotes the signal and \(\hat{n}_l(k)\) represents the additive Gaussian noise at l’th sensor with zero mean. Moreover, s(k) and \(\hat{n}_l(k)\) are mutually independent random processes. According to the result discussed in [23], the main contrivance for image fusion that we address here is to seek an optimal fusion such that the uncertainty of the fused information is minimized. The optimal fused information is given by \(s^*(k)=\sum\limits_{l=1}^n{x_{l}^{*}}I_l(k)\), where the optimal fusion \( x^*\) is chosen according to the following deterministic quadratic programming problem:

$$ \hbox{minimize} \; x^{T}Rx $$
(26)
$$ \hbox{subject to}\; \left\{ \begin{array}{l} a^Tx=1,\\ x\geq 0, \end{array} \right. $$
(27)

where \({a=[1 ,\ldots,1 ]^T\in \mathbb{R}^n}\) and \(R=\frac{1}{MN}\sum\limits_{k=1}^{MN}I(k)I(k)^T\). The artificial neural network (11) and (12) converges to the optimal fusion vector \( x^*\). In order to see the fused image we convert \( s^*\) to \( f^*\) as

$$ f^* (i, j)=s^* ((i- 1)N+ j), \quad i= 1,.. ,M;\; j= 1,\ldots, N. $$

Finally, we use the function “imshow(\( f^*\))” in Matlab software. One can improve the quality of the fused images using the proposed neural image fusion algorithm by increasing the number of sensors.

As an example, we illustrate the performance of the proposed neural image fusion algorithm for gray-level Lena images, as shown in Fig. 14. These Lane images are eight bit gray-level images with 206 × 245 pixels. Figure 14a is the display of a typical noisy measurement of the Lane image at one sensor, where SNR is 10 dB. Figure 14b–d show the fused images by the proposed algorithm for the number of sensors n = 10, 20 and 30, respectively. It is easy to verify that the quality of images is improving when the number of sensors increases.

Fig. 14
figure 14

Lena image fusion using neural image fusion algorithm. a The noisy image. bd The fused image with n = 10, 20 and 30 sensors in Example 5.6

To end this section, we answer a natural question: are there advantages of the proposed neural network compared to the existing ones? To answer this, we summarize what we have observed from numerical experiments and theoretical results as below.

  • Compared with traditional numerical optimization algorithms, the neural network approach has several potential advantages in real-time applications. First, the structure of a neural network can be implemented effectively using very large scale integration and optical technologies. Second, neural networks can solve many optimization problems with time-varying parameters. Third, the dynamical techniques and the numerical ODE techniques can be applied directly to the continuous-time neural network for solving constrained optimization problems effectively.

  • We compare our neural network model with some existing models which also work for quadratic optimization problems, for instance, the ones used in (16), (17) and (18). At first glance, these neural network models look having lower complexity. However, we observe that the difference in the numerical performance is very marginal by testing some quadratic optimization problems.

  • Changing initial points may not have much effect for our neural network model, whereas it does for some existing models. The reason is that our model is globally convergent to the optimal solution of the problem.

  • Three examples with respect to support vector machines for regression, constrained least–squares approximation problem and image fusion algorithm for gray-level images are presented to indicate the real-time applications of the proposed neural network.

Conclusion

In this paper, we have proposed a new neural network for solving SCQP problems. Based on the duality theory of convex programming, FB function, KKT optimality conditions, convex analysis theory, Lyapunov stability theory and LaSalle invariance principle, the constructed network can find the optimal solution of the primal and dual problems simultaneously. Compared with the gradient-based networks available, the structure of the proposed network is reliable and efficient. The other advantages of the proposed neural network are that it can be implemented without a penalty parameter and can be convergent to an exact solution to SCQP program with general linear constraints. Moreover, the proposed transformation method in this article can allow us to transform easily and efficiently general constrained SCQP programming problems into unconstrained problems. In this article, we also analyze the influence of the parameter k on the convergence rate of the trajectory and the convergence behavior of \(\|y(t)-y^*\|^2\) and obtain that a larger k leads to a better convergence rate. The simulation results have demonstrated the global convergence behaviors and characteristics of the proposed neural network for solving several SCQP problems.