Keywords

1 Introduction

There are a lot of studies of recurrent neural networks focusing on the filed of signal processing [1, 2], pattern classification [3, 4], robotics [5, 6], optimization [7], and so on. Especially, with the invention of the Hopfield [8], it was specially invented for solving online optimization. Recurrent neural networks are becoming a popular research branch in the field of online optimization. They are with powerful parallelism and online solving capability. Recurrent neural networks have made huge advances for online optimization in both theory and application. A recurrent neural network [9] is developed for nonlinear programming problems, where a penalty term is introduced as equality and inequality constraints, and it converges to an approximate optimal solution. A switched-capacitor neural network is proposed [10] for solving nonlinear convex programming problems. However, the model will be unstable in the case that the optimal solution is outside the feasible region. A neural network is proposed for solving linear quadratic programming problems [11]. The optimal solution is proven globally converged. Some slack variables is introduced to the problem, which leads the dimension of model is too large. A dual neural network is proposed for reducing the dimension. It is composed of a single layer of neurons,and the dimension of the dual network is equal to its neurons. The model and its modifications [13, 14] are introduced to kinematic control of robot [12, 15]. A simplified dual neural network is proposed [16]. It much reduces complexity while the convergence property is sound. The model is applied to the KWTA problem in real time [17], which is just a single neuron. However, it just deals with quadratic programming problem with a square quadratic term in box constraints and cost function. A recurrent neural network for solving general quadratic programming problems is proposed. It is with fewer neurons, and the dimension of the model is greatly reduced while keeping sound accuracy and efficiency.

The remainder of this paper is organized as follows. In Sect. 2, A neural network model is presented for solving quadratic programming problems. In Sect. 3, the convergence of the neural network is analyzed and it is proven to be globally convergent to the optimal solution of the quadratic programming problems. A discrete-time model in Sect. 4 for solving the same problem and an alternative neural network model for solving the quadratic programming problem under irredundant equality constraints are studied. In Sect. 5, numerical examples are given to demonstrate the effectiveness of our method. Section 6 is the conclusion.

In this paper, \( {\mathbb{R}} \) denotes the real number field, \( A^{T} \) represents the transport matrix of \( A \), \( I \) denotes a unitary matrix.

2 Mathematical Model

The general quadratic programming problem as following is studied:

$$ \begin{aligned} \hbox{min} (\frac{1}{2}x^{T} Wx + c^{T} x) \hfill \\ {\text{s}} . {\text{t}} .\,\,\,\,\,\,\,\begin{array}{*{20}c} {Ax = b} \\ {Ex \le e} \\ \end{array} \hfill \\ \end{aligned} $$
(1)

Where \( W \in {\mathbb{R}}^{n \times n} \), \( x \in {\mathbb{R}}^{n} \) denotes a positive definite matrix, \( c \in {\mathbb{R}}^{n} \), \( A \in {\mathbb{R}}^{m \times n} \), \( b \in {\mathbb{R}}^{m} \), \( e \in {\mathbb{R}}^{q} \) and \( E \in {\mathbb{R}}^{q \times n} \). The equality constraint \( Ax = b \) is transformed to two inequalities equivalently: \( -Ax \le - b \) and \( Ax \le b \). The Eq. (1) is transferred as a quadratic programming problem subject to inequality constraints. It is as following:

$$ \begin{aligned} \hbox{min} (\frac{1}{2}x^{T} Wx + c^{T} x) \hfill \\ {\text{s}} . {\text{t}} .\,\,\,\,\,\frac{1}{2}Bx \le d \hfill \\ \end{aligned} $$
(2)
$$ B = \left[ \begin{aligned} A \hfill \\ - A \hfill \\ E \hfill \\ \end{aligned} \right],\;d = \left[ \begin{aligned} b \hfill \\ - b \hfill \\ e \hfill \\ \end{aligned} \right]\text{ } $$
(3)

Where \( B \in {\mathbb{R}}^{p \times n} \), \( d \in {\mathbb{R}}^{p} \) and \( p = 2 \times m + q \). The inequality constraint of (2) is transformed as \( \hbox{max} (Bx - d) \le 0 \). The \( \hbox{max} (x) \) denotes the largest element of vector \( x \). The problem (1) is equivalent to the Eq. (4):

$$ \begin{aligned} \hbox{min} (\frac{1}{2}x^{T} Wx + c^{T} x) \hfill \\ {\text{s}} . {\text{t}} .\,\,\,\,B_{\sigma }^{T} x - d_{\sigma } \le 0 \hfill \\ \end{aligned} $$
(4)

In the Eq. (4), \( \sigma \) denotes the row No. of the biggest element of \( Bx - d \), and \( B_{\sigma }^{T} \) represents the \( \sigma^{th} \) row of \( B \), \( d_{\sigma } \) denotes the \( \sigma^{th} \) element of \( d \). According to the KKT terms, the solution of problem (4) meets the requirements:

$$ Wx + c - \mu B_{\sigma } = 0 $$
$$ \left\{ \begin{aligned} B_{\sigma }^{T} x - d_{\sigma } = 0\text{ }if\text{ }\mu \le 0 \hfill \\ B_{\sigma }^{T} x - d_{\sigma } \le 0\text{ }if\text{ }\mu = 0 \hfill \\ \end{aligned} \right. $$
(5)

The dual variable of inequality constraint in the Eq. (4) is represented with \( \mu \in {\mathbb{R}} \). The Eq. (5) is simplified with an upper saturation function as following:

$$ \begin{array}{*{20}l} {Wx + c - \mu B_{\sigma } = 0} \hfill \\ {B_{\sigma }^{T} x - d_{\sigma } = g(B_{\sigma }^{T} x - d_{\sigma } - \mu )} \hfill \\ \end{array} $$
(6)

Where the upper saturation function \( g(.) \) is as following:

$$ g(.) = \left\{ {\begin{array}{*{20}c} 0 & {x > 0} \\ x & {x \le 0} \\ \end{array} } \right. $$
(7)

The \( W \) is positive definite, \( x \) could be explicitly solved with \( \mu \) and the first equality in Eq. (6) as following:

$$ x = \mu W^{ - 1} B_{\sigma } - W^{ - 1} c $$
(8)

A dynamic neuron is used to solve \( \mu \) in Eq. (6) as following:

$$ \in \dot{\mu } = g(B_{\sigma }^{T} x - d_{\sigma } - \mu ) - B_{\sigma }^{T} x + d_{\sigma } $$
(9)

Where \( \in > 0 \) is a scaling parameter. Substituting (8) into (9) generates the neural network dynamics with the following state equation and output equation,

State equation:

$$ \in \dot{\mu } = g( - \mu + B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu - B_{\sigma }^{T} W^{ - 1} c - d_{\sigma } ) - B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu + B_{\sigma }^{T} W^{ - 1} c + d_{\sigma } $$
(10)

Output equation:

$$ x = \mu W^{ - 1} B_{\sigma } - W^{ - 1} c\text{ } $$
(11)

Where \( \sigma \) is the row No. of \( \hbox{max} (Bx - d) \), \( d \) and \( B \) and are as Eq. (3).

Remark 2.1:

Only one dynamic neuron is required in the neural network (10), which is nothing to do with the conditions of Eq. (1). There are at least \( q \) dynamic neurons in recurrent neural networks for solving a general quadratic programming problem in [13,14,15,16, 18]. It is the No. of inequalities of problem (1). However, the proposed model is just a single dynamic neuron, which greatly reduces the number of neurons and computational complexity.

Remark 2.2:

The neural network dynamic modeled by (10) is a switched dynamic system. It switches in a family of dynamic systems under the endogenous switching signal σ (signal flow in the neural network is plotted in Fig. 1)

Fig. 1.
figure 1

Signal flow of the simplified neural network.

$$ \begin{array}{*{20}l} { \in \dot{\mu } = f_{\sigma } (\mu ),\sigma \in S = \left\{ {1,2 \ldots ,p} \right\}} \hfill \\ {\sigma :\mu \to S} \hfill \\ \end{array} $$
(12)

Where \( f_{\sigma } (\mu ) = g(B_{\sigma }^{T} x - d_{\sigma } - \mu ) - B_{\sigma }^{T} x + d_{\sigma } \)

Remark 2.3:

The proposed model is liable to be implemented with hardware. In comparison with the research [16], the new part with hardware implementation is a switching of the proposed network (10). The \( \sigma \) is a switching signal, which is equal to the row No. of \( \hbox{max} (Bx - d) \). It is the winner of \( Bx - d \). \( \sigma \) is gotten with a WTA circuit [19]. It is worked as the switching signal for \( B_{\sigma }^{{}} \) and \( d_{\sigma } \) in (10). There are a lot of studies on WTA circuits, and it is easily implemented with CMOS [20]. It is also easily implemented by a recurrent neural network just a single dynamic neuron [19]. There is just two dynamic neurons in the neural network (10), which is much less than the solutions of [13,14,15,16,17,18].

3 Convergence Analysis

It is a feasible way to prove the global convergence of the proposed system by constructing a common negative definite Lyapunov function. However, choosing such a common Lyapunov function is a difficult problem. Some researches on the contraction theory [21, 22] greatly simplify the proof process with virtual dynamics of the system. In this paper, the contraction analysis is made to prove the proposed model (10) convergent. The proof process is based on one definition and two lemmas:

Definition 3.1

([22]): Given the system equations \( \dot{x} = f(x,t) \), a region of the state space is called a contraction region with respect to a uniformly positive definite metric \( M(x,t) = \Uptheta^{T} \Uptheta \) if \( \frac{{\partial f^{T} }}{\partial x}M + M\frac{\partial f}{\partial x} + \dot{M} \le - \beta M \) (with \( \beta > 0) \) in that region.

Lemma 3.1

[22]: Given the system equations

\( \dot{x} = f(x, t) \), any trajectory, which starts in a ball of constant radius with respect to the metric \( M(x, t) \), center at a given trajectory and contained at all times in a contraction region with respect o \( M(x, t) \), remains in that ball and converges exponentially to this trajectory. furthermore, global exponential convergence to the given trajectory is guaranteed if the whole state space is a contraction region with respect to the metric \( M(x, t) \).

Remark 3.1:

As pointed out in [21], Lemma 3.1 holds for all switched systems in case that all subsystems in the switched family are within a same contraction region \( M(x,t) \).

Lemma 3.2:

The solution to problem (6) (represented by \( \mu^{*} \), \( x^{*} \)) is the optimal solution to problem (1). At the same time, \( \mu^{*} \) is an equilibrium point of the state equation of the proposed model, which is as following:

$$ g( - \mu + B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu - B_{\sigma }^{T} W^{ - 1} c - d_{\sigma } ) - B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu + B_{\sigma }^{T} W^{ - 1} c + d_{\sigma } = 0 $$

The optimal solution also satisfies (11) as following: \( x^{*} = \mu W^{ - 1} B_{\sigma } - W^{ - 1} c. \)

Proof:

According to the KKT condition, equation sets (5) are the optimal solution to problem (4). Equation sets (6) are equivalent to equation sets (5), the solution to equation set (6) is equivalent to the optimal solution to problem (1). They are as following:

$$ g( - \mu^{*} + B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu^{*} - B_{\sigma }^{T} W^{ - 1} c - d_{\sigma } ) - B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu^{*} + B_{\sigma }^{T} W^{ - 1} c + d_{\sigma } = 0 $$

Where \( x^{*} = \mu W^{ - 1} B_{\sigma } - W^{ - 1} c \).

Now, we are on the stage to state the convergence result of the proposed model.

Theorem 3.1:

The proposed (10) converges to the equilibrium point from any start point \( \mu \in {\mathbb{R}} \), and the Eq. (11) is the optimal solution to Eqs. (1).

Proof:

Based on the contraction theory for analyzing the convergence of (10):

$$ \dot{\mu } = f(\mu ,t) = \frac{1}{\varepsilon }(g( - \mu + B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu - B_{\sigma }^{T} W^{ - 1} c - d_{\sigma } ) - B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu + B_{\sigma }^{T} W^{ - 1} c + d_{\sigma } ) $$
(13)

The partial derivative \( \frac{\partial f}{\partial \mu } \) is as following:

$$ \frac{\partial f}{\partial \mu } = \text{ }\left\{ {\begin{array}{*{20}l} { - \frac{{B_{\sigma }^{T} W^{ - 1} B_{\sigma } }}{\varepsilon }\text{ }if\text{ }\upsilon \ge \text{ }0} \hfill \\ { - \frac{1}{\varepsilon }\text{ }\upsilon < 0\text{ }} \hfill \\ \end{array} \text{ }} \right. $$
(14)

Where \( \upsilon = - \mu + B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu - B_{\sigma }^{T} W^{ - 1} c - d_{\sigma} \). Note that the derivate \( g^{\prime}(x) = 0 \) if \( x \ge \text{ }0 \) and \( g^{\prime}(x) = 1 \) if \( x < 0 \) (more exactly, it should be the upper right-hand derivative since the function \( g(x) \) is non-smooth at \( x = 0 \).) is used in the derivation of (14). By choose \( M = 1 \), we have

$$ \frac{{\partial f^{T} }}{\partial \mu }M + M\frac{\partial f}{\partial \mu } + \dot{M} = 2\frac{\partial f}{\partial \mu } \le - \frac{{2\hbox{min} (1,B_{\sigma }^{T} W^{ - 1} B_{\sigma } )}}{\varepsilon }\text{ } $$
(15)

Since \( W^{ - 1} \) is positive definite and \( B_{\sigma } \) is not equal to 0 (otherwise, the corresponding inequality in (2) does not include \( x \)), we conclude that \( B_{\sigma }^{T} W^{ - 1} B_{\sigma } > 0 \) and \( \frac{{\hbox{min} (1,B_{\sigma }^{T} W^{ - 1} B_{\sigma } )}}{\varepsilon } > 0. \)

By defining \( \beta = \frac{{\hbox{min} \{ 1,\min_{i} (1,B_{i}^{T} W^{ - 1} B_{i} )}}{\varepsilon } \), with \( B_{i}^{T} \) denoting the \( i{\text{th}} \) row of \( B \) and the set \( {\mathbf{S}} = \{ 1,2, \ldots ,p\} \), we have the following inequality for all \( \mu \in {\mathbb{R}} \) and all \( \sigma \in {\mathbf{S}} \) under the same metric \( M \),

$$ \frac{{\partial f^{T} }}{\partial x}M + M\frac{\partial f}{\partial x} + \dot{M} \le - \beta M $$
(16)

The whole state space of \( \mu \) is a contraction region.

Based on Lemma 3.2, \( \mu^{*} \) and \( x^{*} \), is the optimal solution to problem (1), and it is also an equilibrium point to Eq. (10).

4 Extensions

4.1 Discrete-Time Model

In this part, we propose the discrete-time model to solve the problem (1) and give conditions for global convergence

Replacing \( \mu \), \( \dot{\mu } \) with \( \mu_{n} \), \( \frac{{\mu_{n + 1} - \mu_{n} }}{\Delta t} \) in (10), respectively, and denoting \( \Upsilon = \frac{\Delta t}{\varepsilon } \), we get the following state equation and output equation of the discrete model as following with some trivial manipulations. state equation:

$$ \mu_{n + 1} =\Upsilon g( - \mu_{n} + B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu_{n} - B_{\sigma }^{T} W^{ - 1} c - d_{\sigma } ) + (1 -\Upsilon B_{\sigma }^{T} W^{ - 1} B_{\sigma } )\mu_{n} +\Upsilon B_{\sigma }^{T} W^{ - 1} c +\Upsilon d_{\sigma } $$
(17)

The output equation:

$$ x_{n} = \mu_{n} W^{ - 1} B_{\sigma } - W^{ - 1} c $$
(18)

Where \( \sigma \) is the row No. of the largest element of \( Bx_{n} - d \), \( B \) and \( d \) are as equation sets (3).

In the discrete-time case, the contraction theory is an extension of the well-known contraction mapping theorem. We still use the contraction theory to analyze the convergence. For discrete-time systems, the definition of a contraction region and the condition for contraction are stated as below.

Definition 4.1

([22]): The discrete-time model is as following:

Equation \( x_{n + 1} = f_{n} (x_{n} ,n) \), is a contraction region. Positive definite metric \( M_{n} (x_{n} ,n) = \Uptheta_{n}^{T} \Uptheta_{n} \), given in the region \( \exists \beta > 0 \), \( F_{n}^{T} F_{n} - I \le - \beta I < 0 \), where \( F_{n} = \Uptheta_{n + 1} \frac{{\partial f_{n} }}{{\partial x_{n} }}\Uptheta_{n}^{ - 1} \).

Lemma 4.1

([22]): If the whole state space is a contraction region, The global convergence to the given trajectory is made sure.

Theorem 4.1:

The discrete model (17) will converge to the equilibrium point after initialization \( \mu \in {\mathbb{R}} \) and the output of equilibrium point, as shown in (18), is also the optimal solution to problem (1) under:

$$ 0 < {{{\upgamma} < 2,}}\,\,\,{{{\upgamma} < }}\frac{2}{{B_{i}^{T} W^{ - 1} B_{i} }}\,{\text{for}}\,{\text{all}}\,i \in {\mathbf{S}} $$
(19)

where \( B_{i}^{T} \) denotes the \( i{\text{th}} \) row of \( B \) and \( {\mathbf{S}} = \{ 1,2,3, \ldots ,p\} \)

Proof:

Firstly, solve the equilibrium point for the discrete model (17), and get the output of the Eq. (18). Secondly, prove the global contraction to the equilibrium point.

Step 1: Comparing the discrete-time model (17) and its output (18) with the continuous-time model (10) and its output (11), we can find that they have the same equilibrium and the same output at this equilibrium. According to the Lemma 3.2, the solution to equation set (6) is an equilibrium to the discrete model (17) too, and the output at the point is the optimal solution to the problem (1).

Step 2: To the discrete model (17),

$$ \begin{array}{*{20}l} {\mu_{n + 1} = f_{n} (x_{n} ,n)} \hfill \\ {\quad \quad =\Upsilon g( - \mu_{n} + B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu_{n} - B_{\sigma }^{T} W^{ - 1} c - d_{\sigma } ) + (1 -\Upsilon B_{\sigma }^{T} W^{ - 1} B_{\sigma } )\mu_{n} +\Upsilon B_{\sigma }^{T} W^{ - 1} c +\Upsilon d_{\sigma } } \hfill \\ \end{array} $$
(20)

Calculate \( \frac{{\partial f_{n} }}{{\partial \mu_{n} }} \) as follows:

$$ \frac{{\partial f_{n} }}{{\partial \mu_{n} }} = \left\{ {\begin{array}{*{20}l} {1 -\Upsilon B_{\sigma }^{T} W^{ - 1} B_{\sigma } \text{ }if\text{ }v_{n} \ge 0} \hfill \\ {1 -\Upsilon if\text{ }v_{n} < 0} \hfill \\ \end{array} } \right. $$
(21)

Where \( v_{n} = - \mu_{n} + B_{\sigma }^{T} W^{ - 1} B_{\sigma } \mu_{n} - B_{\sigma }^{T} W^{ - 1} c - d_{\sigma } \). Choosing \( \Uptheta_{n} = 1 \) for \( n = 0,1,2, \ldots ,n \) then

$$ \begin{aligned} F_{n} & = \Uptheta_{n + 1} \frac{{\partial f_{n} }}{{\partial \mu_{n} }}\Uptheta_{n}^{ - 1} & = \frac{{\partial f_{n} }}{{\partial \mu_{n} }} \end{aligned} $$
(22)

Based on (21) and (22),

$$ \begin{aligned} F_{n}^{T} F_{n} & \le (max\{ |1 -\upgamma | , | 1 { - }{\upgamma}B_{\sigma }^{T} W^{ - 1} B_{\sigma } |\} )^{2}\\ & \le (max\{ |1 -\upgamma | ,max_{{i \in {\mathbf{S}}}} \{ | 1 { - }{\upgamma}B_{\sigma }^{T} W^{ - 1} B_{i} |\} \} )^{2} \end{aligned} $$
(23)

Set \( \beta = 1 - (max\{ |1 -\upgamma | ,max_{{i \in {\mathbf{S}}}} {\{ } | 1 { - }{\upgamma}B_{\sigma }^{T} W^{ - 1} B_{i} |\} \} )^{2} \). To all \( \mu_{n} \in {\mathbb{R}} \), \( \sigma \in {\mathbf{S}} \), the inequality as following (24) is valid.

$$ F_{n}^{T} F_{n} - I \le \beta I $$
(24)

Since \( 0 < {\varvec{\upgamma}} < 2,\text{ }0 < {\varvec{\upgamma}} B_{i}^{T} W^{ - 1} B_{i} < 2 \) for \( i \in {\mathbf{S}} \), we get \( |1 - {{{\upgamma} |}} < 1{{, |1 - {\upgamma} }} B_{i}^{T} W^{ - 1} B_{i} | < 1 \) for \( i \in {\mathbf{S}} \). Therefore, \( (max\{ |1 -\upgamma | ,max_{{i \in {\mathbf{S}}}} {\{ } | 1 { - }{\upgamma}B_{\sigma }^{T} W^{ - 1} B_{i} |\} \} )^{2} \le 1 \) and \( \beta > 0 \).

Then the whole state space is within contraction range. Let the equilibrium point of (17) to be the given trajectory, according to Lemma 4.1, the discrete model (17) will convergence to the equilibrium point.

4.2 Irredundant Equality Constraint

Based on that the equality constraint exists without redundancy, there is another way to tackle the problem (1). This is to say, the matrix \( A \in {\mathbb{R}}^{m \times n} \) in (1) problem meets the requirement of rank \( (A) = m \) and \( m < n. \)

\( Ex \le e \) in problem (1) is replaced by \( max(Ex - e) \le 0 \), then

$$ \begin{aligned} & {\text{min( }}\frac{1}{2}x^{T} Wx + C^{T} x) \hfill \\ & {\text{ s.t.}}\;Ax = b \hfill \\ & E_{\sigma }^{T} x - e_{\sigma } \le 0 \hfill \\ \end{aligned} $$
(25)

Based on KKT conditions and the upper saturation function g(·), the solution to problem (25) meets,

$$ \begin{array}{*{20}c} {Wx + c - A^{T} y - \mu B_{\sigma }^{T} = 0} \\ {\text{ }Ax = b} \\ {E_{\sigma }^{T} x - e_{\sigma } = g(E_{\sigma }^{T} x - e_{\sigma } - \mu )} \\ \end{array} $$
(26)

Where \( y \in {\mathbb{R}}^{m} ,\;\mu \in {\mathbb{R}} \). \( x \) and \( y \) can be explicitly solved in terms of \( \mu \) under the above three equations:

$$ \begin{array}{*{20}l} {x = \mu PE_{\sigma } + s} \hfill \\ {y = (AW^{ - 1} A^{T} )^{ - 1} ( - AW^{ - 1} B_{\sigma } \mu + AW^{ - 1} c + b)} \hfill \\ \end{array} $$
(27)

Where \( P = - W^{ - 1} A^{T} (AW^{ - 1} A^{T} )^{ - 1} AW^{ - 1} + W^{ - 1} \), \( \text{S} = W^{ - 1} (A^{T} (AW^{ - 1} A^{T} )^{ - 1} (AW^{ - 1} c + b) - c) \) \( P \in {\mathbb{R}}^{n \times n} \). The third equation in (26) is solved as following:

$$ \varepsilon \dot{\mu } = g(E_{\sigma }^{T} x - e_{\sigma } - \mu ) \;{ - }\;E_{\sigma }^{T} x + e_{\sigma } $$
(28)

Where \( \varepsilon > 0 \) is a scaling parameter. The state equation of the neural network is as following:

$$ \varepsilon \dot{\mu } = g(E_{\sigma }^{T} PE_{\sigma } \mu - \mu + E_{\sigma }^{T} s - e_{\sigma } ) \;{ - }\;E_{\sigma }^{T} PE_{\sigma } \mu - E_{\sigma }^{T} s + e_{\sigma } $$
(29)

Before stating the convergence result about the neural network (29), the following lemma is presented, which is used in the convergence proof.

Lemma 4.2:

The symmetric matrix \( P \in {\mathbb{R}}^{n \times n} \), \( P = - W^{ - 1} A^{T} (AW^{ - 1} A^{T} )^{ - 1} AW^{ - 1} + W^{ - 1} \) is semi-positive definite, i.e., \( z^{T} Pz \ge 0 \) for all \( z \in {\mathbb{R}}^{n} \) and \( z^{T} (W^{ - 1} A^{T} (AW^{ - 1} A^{T} )^{ - 1} A - I) \ne 0 \).

Proof:

Since \( W^{ - 1} \in {\mathbb{R}}^{n \times n} \) is positive definite, it can be factorized into \( W^{ - 1} = QQ^{T} \) with \( Q \) positive definite. Defining \( G \in {\mathbb{R}}^{m \times n} \), \( G = AQ \), then \( G \) is also row full rank (since \( A \) is row full rank and \( Q \) is positive definite) and therefore \( G \) can be decomposed into \( G = U[ \wedge \text{ }0]V \) via singular value decomposition, where \( U \in {\mathbb{R}}^{m \times m} \), \( V \in {\mathbb{R}}^{n \times n} \) are both unitary matrices, \( [ \wedge \text{ }0] \in {\mathbb{R}}^{m \times n} \), \( \wedge \text{ } \in {\mathbb{R}}^{m \times m} \) is a diagonal matrix with all positive elements on the diagonal. Bringing

$$ W^{ - 1} = QQ^{T} ,\,G = AQ,\,{\text{and}}\,G = U[ \wedge \text{ }0]V\,\,{\text{into}}\,\,P = - W^{ - 1} A^{T} (AW^{ - 1} A^{T} )^{ - 1} AW^{ - 1} + W^{ - 1} $$

We get the equations as (30) as below. The stability of the neural network (29) and its global convergence to the optimal solution of (1) is guaranteed by the following theorem.

Theorem 4.2:

The neural network (29) exponentially converges to its equilibrium from any initial point \( \mu \in {\mathbb{R}} \), and its output at this equilibrium by following the output Eq. (27) is the optimal solution to problem (1), if \( rank(A) = m \) in (1) and \( (E_{i}^{T} (W^{ - 1} A^{T} (AW^{ - 1} A^{T} )^{ - 1} A - I) \ne 0 \)

For \( i = 1,2, \ldots ,q \), with \( E_{i}^{T} \) denoting the \( i{\text{th}} \) row of the matrix \( E \).

Proof:

This theorem can be proven in a two-step procedure similar to the proof of Theorem 4.1. The first step is to solve the equilibrium and show that the output at this equilibrium is the optimal solution to the problem (1). The second step is to prove the global contraction to the equilibrium. Note that the condition \( (E_{i}^{T} (W^{ - 1} A^{T} (AW^{ - 1} A^{T} )^{ - 1} A - I) \ne 0 \) For \( i = 1,2, \ldots , q \), according to Lemma 4.2, guarantees \( E_{\sigma }^{T} PE_{\sigma } > 0 \) for all possible switching signal \( \sigma \). Based on this result, global contraction of the neural network (29) can be proven. Detailed proof process for the two steps is omitted.

$$ \begin{aligned} P & = - QQ^{T} A^{T} (AQQ^{T} A^{T} )^{ - 1} AQQ^{T} + QQ^{T} \\ & = - QG^{T} (GG^{T} )^{ - 1} GQ^{T} + QQ^{T} \\ & = - QV^{T} \left[ {\begin{array}{*{20}c} \varLambda \\ 0 \\ \end{array} } \right]U^{T} (U\varLambda^{ - 2} U^{T} )U\left[ {\begin{array}{*{20}c} \varLambda & 0 \\ \end{array} } \right]VQ^{T} + QQ^{T} \\ & = - QV^{T} \left[ {\begin{array}{*{20}c} {I_{m \times m} } & 0 \\ 0 & 0 \\ \end{array} } \right]VQ + QV^{T} VQ^{T} \\ & = (VQ^{T} )^{T} \left[ {\begin{array}{*{20}c} 0 & 0 \\ 0 & {I_{(n - m) \times (n - m)} } \\ \end{array} } \right]VQ^{T} \ge 0 \\ \end{aligned} $$
(30)

5 Simulations

5.1 Continuous-Time Neural Network Model

The following problem will be solved for showing the convergence of the continuous-time neural network modelled with (10) and (11).

$$ \begin{aligned} \hbox{min} (3x_{1}^{2} + 3x_{2}^{2} + 4x_{3}^{2} + 5x_{4}^{2} + 3x_{1} x_{2} + 5x_{1} x_{3} + x_{2} x_{4} - 11x_{1} - 5x_{4} ) \hfill \\ {\text{s}} . {\text{t}} .\begin{array}{*{20}c} {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, - 3x_{1} + 3x_{2} + 2x_{3} - x_{4} \le 0} \\ {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, - 4x_{1} - x_{2} + x_{3} + 2x_{4} \le 0} \\ {\, - x_{1} + x_{2} \le - 1} \\ {\,\,\,\,\,\,\,\,\, - 2 \le 3x_{1} + x_{3} \le 4} \\ \end{array} \,\,\, \hfill \\ \end{aligned} $$
(31)

To the model, the parameters are as following:

$$ \begin{array}{*{20}l} {W = \left[ {\begin{array}{*{20}c} 6 & 3 & 5 & 0 \\ 3 & 6 & 0 & 1 \\ 5 & 0 & 8 & 0 \\ 0 & 1 & 0 & {10} \\ \end{array} } \right], \, c = \left[ {\begin{array}{*{20}c} { - 11} \\ 0 \\ 0 \\ { - 5} \\ \end{array} } \right]} \hfill \\ {B = \left[ {\begin{array}{*{20}l} { - 1} \hfill & 1 \hfill & 0 \hfill & 0 \hfill \\ 3 \hfill & 0 \hfill & 1 \hfill & 0 \hfill \\ { - 3} \hfill & 0 \hfill & { - 1} \hfill & 0 \hfill \\ { - 3} \hfill & 3 \hfill & 2 \hfill & { - 4} \hfill \\ { - 4} \hfill & { - 1} \hfill & 1 \hfill & 2 \hfill \\ \end{array} } \right],\,\,D = \left[ \begin{aligned} - 1\hfill \\ 4\hfill \\ 2\hfill \\ 0\hfill \\ 0\hfill \\ \end{aligned} \right]} \hfill \\ \end{array} \text{ } $$
(32)

We run the neural network model in simulation with a random initialization of the state variable \( \mu \) and we choose \( \in \;=\;10^{ - 8} \). Simulations is run for \( 5 \times 10^{ - 8} \) s and the simulation result of \( \mu \) and \( x \) are as shown in the Figs. 2 and 3. The optimal solution is about −19.1794310722, the proposed output \( x = [1.8774617068, -1.0393873085, -1.6323851203,0.6039387308] \), the optimum point is −19.179. The differences are both less than \( 1.0 \times 10^{ - 10} \) for \( x \) and the optimum point.

Fig. 2.
figure 2

Transient behavior of \( \mu \) in Example V-A.

Fig. 3.
figure 3

Transient behavior of x in Example V-A.

5.2 Discrete-Time Model

We also consider the problem studied in Example V-A and solve it with the discrete model based on (17) and (18). In the simulation, \( {\gamma} = 0.05 \), randomly initialization \( \mu \). The evolution of \( \mu \) and \( x \) are as shown in Figs. 4 and 5. The proposed network output is \( x = [1.8774617068, - 1.0393873085, - 1.6323851203,0.6039387308] \). The optimal value is −19.179. The difference is less than \( 5.85 \times 10^{ - 6} \) for \( x \) and \( 4.2 \times 10^{ - 5} \) for the optimal value.

Fig. 4.
figure 4

Transient behavior of \( \mu \) in Example V-B.

Fig. 5.
figure 5

Transient behavior of x in Example V-B.

5.3 Problem with Irredundant Equality Constraint

To show the convergence of the neural network, we solve a quadratic programming problem with irredundant equality constraints as following

$$ \begin{aligned} \hbox{min} (3x_{1}^{2} + 3x_{2}^{2} + 4x_{3}^{2} + 5x_{4}^{2} + 3x_{1} x_{2} + 5x_{1} x_{3} + x_{2} x_{4} - 11x_{1} - 5x_{4} ) \hfill \\ {\text{s}} . {\text{t}} .\,\,\,\,\,\,\,\,\,\begin{array}{*{20}c} {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, - 3x_{1} + 3x_{2} + 2x_{3} - x_{4} = 0} \\ {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, - 4x_{1} - x_{2} + x_{3} + 2x_{4} = 0} \\ {\, - x_{1} + x_{2} \le - 1} \\ {\,\,\,\,\,\,\,\,\,\, - 2 \le 3x_{1} + x_{3} \le 4\text{ }} \\ \end{array} \hfill \\ \end{aligned} $$
(33)

As the problem (34), A simulation is performed with a random initialization of the state variable \( \mu \) and the scaling parameter \( \in \,=\,10^{ - 8} \). Simulation result is as shown in Figs. 6, 7 and 8, from which we can observe that the neural network converges very fast after a short transient period shorter after than \( 10^{ -6} \) s. The error of solution at the final time of the simulation is less than \( 0.9 \times 10^{ -10} \) in \( l_{2} \) norm sense, and the optimal solution is \( x = [0.5, -0.5, -1.5,0] \).

$$ \begin{array}{*{20}l} {W\text{ = }\left[ {\begin{array}{*{20}c} 6 & 3 & 5 & 0 \\ 3 & 6 & 0 & 1 \\ 5 & 0 & 8 & 0 \\ 0 & 1 & 0 & {10} \\ \end{array} } \right], \, c = \left[ {\begin{array}{*{20}c} { - 11} \\ 0 \\ 0 \\ { - 5} \\ \end{array} } \right]} \hfill \\ {A\text{ = }\left[ {\begin{array}{*{20}c} 3 & { - 3} & { - 2} & 1 \\ 4 & 1 & { - 1} & { - 2} \\ \end{array} } \right],\text{ }b\text{ = }\left[ \begin{aligned} 0 \hfill \\ 0\hfill \\ \end{aligned} \right]\text{ }} \hfill \\ {E = \left[ {\begin{array}{*{20}c} { - 1} & 1 & 0 & 0 \\ 3 & 0 & 1 & 0 \\ { - 3} & 0 & { - 1} & 0 \\ \end{array} } \right],\text{ }e\text{ = }\left[ \begin{aligned} - 1 \hfill \\ 4\hfill \\ 2 \hfill \\ \end{aligned} \right]\text{ }} \hfill \\ \end{array} $$
(34)
Fig. 6.
figure 6

Transient behavior of \( \mu \) in Example V-C.

Fig. 7.
figure 7

Transient behavior of x in Example V-C.

Fig. 8.
figure 8

Transient behavior of y in Example V-C.

6 Conclusions

For solving the general quadratic programming problems, a new recurrent neural network is proposed. The proposed model is with fewer neurons and simple structure. The neural network are shown global convergence based on contraction analysis. The discrete time counterpart to the proposed neural network is studied and an alternative recurrent neural network to solve the problem under irredundant equality constraint is also studied. Finally, simulation results demonstrate the models are efficient and accurate.