1 Introduction

Over the past decades, the rapid development in adaptive filtering (AF) algorithms is regarded as a significance for practical applications [20]. The AF algorithms have been widely adopted in the field of signal processing including noise cancellation [11], channel equalization [8] and system identification [30]. The fundamental AF algorithms cover the famous least mean squares (LMS) [9], recursive least squares (RLS) [22] and conjugate gradient (CG) algorithm [29]. The LMS with the fixed step size always presents slow convergent speed [19]. Although the conventional RLS algorithm has a faster convergent speed, it suffers from higher computation complexity and numerical instability [25]. Comparing with the well-known RLS and LMS algorithms, the CG algorithm provides a trade-off between computational complexity and convergence speed [31]. Because of the variable step length according to the real-time input and the avoidance for calculating matrix inversion during derivation, the CG algorithm exhibits comparable performance in convergence with a smaller burden in computational complexity than the RLS algorithm [7].

The mean square error (MSE) acts as one of the most popular criteria in AF algorithms, which works well under Gaussian noises. However, signals will be most likely contaminated by non-Gaussian noise in various physical applications, including wireless channel tracking [18], multipath estimation [12] and acoustic echo cancellation [15]. Under these circumstances, the performance of the adaptive filter under the MSE criterion may work poorly [3, 5]. Therefore, combining with the Information Theoretic Learning (ITL) [23], various AF algorithms have been proposed to cope with inference from the non-Gaussian noise. The maximum correntropy criterion (MCC) [17, 24, 27] and minimum error entropy (MEE) [2, 4, 14], acting as the two typical examples for ITL, are insensitive to larger outliers, and improve the performance under impulsive noise. Several algorithms based on the MCC criterion have been proposed, such as the LMS-type algorithm [16, 21, 26], the RLS-type algorithm named the recursive maximum correntropy (RMC) [32] and the CG-type algorithm called the q-Gaussian MCC-CG [6]. Recent research has shown that the MEE criterion presents more robust performance than the MCC [10], and has been effectively utilized in the LMS-type algorithm [26], RLS-type algorithm named the recursive minimum error entropy (RMEE) [23] and the Kalman filter called the MEE-KF [2]. To the best of our knowledge, the CG has not been applied to the MEE criterion.

Under the MEE criterion, the LMS-type algorithm has slow convergent speed or poor steady-state error [14], and the RLS-type algorithm, or the RMEE, has improved both the convergent speed and the steady state error at the cost of a high computational complexity [23]. In this paper, we derive an CG-typed algorithm based on the MEE criterion, called MEE-CG. It can be expected that the MEE-CG algorithm will achieve a trade-off between computational complexity and convergence speed, compared with the LMS-type and RLS-type algorithms based on the MEE criterion.

For comparison, the LMS-type, CG-type [6] and RLS-type [32] algorithm under the MCC criterion are called MCC-LMS, MCC-CG and MCC-RLS, respectively. The LMS-type, CG-type and RLS-type [23] algorithm under the MEE criterion are called MEE-LMS, MEE-CG (proposed in this paper) and MEE-RLS, respectively.

The organization of the paper is structured as follows: in Sect. 2, the problem is addressed; in the following Sect. 3, we derive the MEE-CG algorithm; the analyses on the convergence and computational complexity are given in Sect. 4 and then experimental simulations for system identification are provided in Sect. 5; in the final Sect. 6, we summarize the paper and obtain the conclusion.

2 Problem Statement

2.1 Adaptive Filtering Theory

The block diagram of the basic adaptive filter is shown in Fig. 1. When taking the adaptive filtering theory into consideration, the desired response \({{d}}\in {{\textbf{R}}^1}\) are generated via an input \({\textbf{u}}\in {{\textbf{R}}^M}\) at instant n

$$\begin{aligned} {d}\left( n \right) = {\textbf{w}}_o^T{\textbf{u}}\left( n \right) {\hspace{1.0pt}} {\hspace{1.0pt}} + {v}\left( n \right) , \end{aligned}$$
(1)

where \({\textbf {w}}_{o}\in {{\textbf{R}}^M}\) denotes the unknown coefficient, and v(n) represents the zero-mean observation noise with variance \(\sigma ^{2}_{v}\). The estimation error can be represented as

$$\begin{aligned} {e}\left( n \right)&= {d}\left( n \right) - {\textbf{w}}_{}^T\left( {n-1 } \right) {\textbf{u}}\left( n \right) \nonumber \\&={d}\left( n \right) - y(n), \end{aligned}$$
(2)

where \(y(n)={\textbf{w}}_{}^T\left( {n -1} \right) {\textbf{u}}\left( n \right) \), and \({\textbf {w}}(n-1)\) accounts for the estimation of \({\textbf {w}}_{o}\) at instant \(n-1\). For simplification, we make the assumptions as follows: 1) The additive noise is white, and we have

$$\begin{aligned} E\left\{ {v\left( m \right) v\left( n \right) {\hspace{1.0pt}} } \right\} {\hspace{1.0pt}} = 0,m \ne n. \end{aligned}$$
(3)

2) The inputs \({\textbf{u}}(n)\) is composed of a zero-mean white sequence

$$\begin{aligned} E\left\{ {{{\textbf{u}}^T}\left( m \right) {\textbf{u}}\left( n \right) } \right\} {\hspace{1.0pt}} = E\left\{ {{\textbf{u}}\left( m \right) {{\textbf{u}}^T}\left( n \right) } \right\} {\hspace{1.0pt}} = 0,m \ne n. \end{aligned}$$
(4)

3) The inputs are uncorrelated with the additive noise at moments (mn)

$$\begin{aligned} E\left\{ {{{\textbf{u}}^H}\left( m \right) {v}\left( n \right) } \right\} {\hspace{1.0pt}} {\hspace{1.0pt}} = 0,m \ne n. \end{aligned}$$
(5)
Fig. 1
figure 1

Block diagram of the basic adaptive filter

2.2 Minimum Error Entropy (MEE) Criterion

From the Information Theoretic Learning (ITL), we can obtain the empirical version of the quadratic information potential [14, 26] for filters with sliding window in length L

$$\begin{aligned} V(e) = \frac{1}{{{L^2}}}{\hspace{1.0pt}} \sum \limits _{i = 1}^{L} {\sum \limits _{j = 1}^{L} {{\lambda ^{i + j}}{G_\sigma }\left( {{e(i)} - {e(j)}} \right) } }, \end{aligned}$$
(6)

where \({G_\sigma }\) stands for the Gaussian kernel function with bandwidth \(\sigma \), and \(\lambda \left( {0 < \lambda \le 1} \right) \) denotes the forgetting factor

$$\begin{aligned} {G_\sigma }(x) = \frac{1}{{\sqrt{2\pi } \sigma }}\exp \left( { - \frac{{{x^2}}}{{2{\sigma ^2}}}} \right) {\hspace{1.0pt}} . \end{aligned}$$
(7)

Assume that the error is a random variable with probability density function and an estimator of Renyi’s quadratic entropy [14] for the error can be written as

$$\begin{aligned} R_q(e)=\log \frac{1}{V(e)}. \end{aligned}$$
(8)

According to ITL, minimizing the error entropy is equivalent to maximizing the formula (8). Therefore, based on the MEE [2, 4, 14], the cost function can be obtained as

$$\begin{aligned} {J_{MEE}}({\textbf{w}}) = \frac{1}{{{L^2}}}{\hspace{1.0pt}} \sum \limits _{i = 1}^{L} {\sum \limits _{j = 1}^{L} {{\lambda ^{i + j}}{G_\sigma }\left( {{e(i)} - {e(j)}} \right) } } . \end{aligned}$$
(9)

We get the gradient of (9) as

$$\begin{aligned}&\nabla {J_{MEE}}\left( {\textbf{w}} \right) \nonumber \\&= \frac{1}{{{L^2}{\sigma ^2}}}{\hspace{1.0pt}} \sum \limits _{i =1}^L {\sum \limits _{j = 1}^L {{\lambda ^{i + j}}\left( {{{\textbf{u}}(i)} - {{\textbf{u}}(j)}} \right) {G_\sigma }\left( {{e(i)} - {e(j)}} \right) \left( {{e(i)} - {e(j)}} \right) } } . \end{aligned}$$
(10)

For each instant n, we simplify the following expressions

$$\begin{aligned} \left\{ \begin{array}{l} {{\textbf{u}}_i} = {\textbf{u}}\left( {n - i} \right) ,\\ {d_i} = d\left( {n - i} \right) ,\\ {e_i} = e\left( {n - i} \right) ,i = \mathrm{{0}},\mathrm{{1}}, \cdots ,L - \mathrm{{1}}\mathrm{{.}} \end{array} \right. \end{aligned}$$
(11)

Thus, we are able to change the form of the following expressions

$$\begin{aligned} \left\{ \begin{array}{l} {{\textbf{U}}_L} = \left[ {{{\textbf{u}}_{\mathrm{{0}}}},{{\textbf{u}}_\mathrm{{1}}}, \ldots ,{{\textbf{u}}_{L - \mathrm{{1}}}}} \right] = \left[ {{{\textbf{u}}_\mathrm{{0}}},{{\textbf{U}}_{L - \mathrm{{1}}}}} \right] ,\\ {{\textbf{D}}_L} = {\left[ {{d_0},{d_1}, \ldots ,{d_{L - 1}}} \right] ^T} = {\left[ {{d_0},{{\textbf{D}}_{L - 1}}} \right] ^T},\\ {\varepsilon _L} = {\left[ {{e_0},{e_1}, \ldots ,{e_{L - 1}}} \right] ^T} = {\left[ {{e_0},{\varepsilon _{L - 1}}} \right] ^T}.\\ \end{array} \right. \end{aligned}$$
(12)

So (10) can be rewritten as

$$\begin{aligned}&\nabla {J_{MEE}}\left( {\textbf{w}} \right) \nonumber \\&\quad = \frac{1}{{{L^2}{\sigma ^2}}}{\hspace{1.0pt}} \sum \limits _{i = 0}^{L - 1} {\sum \limits _{j = 0}^{L - 1} {{\lambda ^{i + j}}\left( {{{\textbf{u}}_i} - {{\textbf{u}}_j}} \right) {G_\sigma }\left( {{e_i} - {e_j}} \right) \left( {{e_i} - {e_j}} \right) } } \nonumber \\&\quad = \frac{2}{{{L^2}{\sigma ^2}}}{{\textbf{U}}_L}({{\textbf{P}}_L} - {{\textbf{Q}}_L}){{\mathbf{\varepsilon }}_L}\nonumber \\&\quad = \frac{2}{{{L^2}{\sigma ^2}}}{{\textbf{U}}_L}{\Phi _L}{{\mathbf{\varepsilon }}_L}. \end{aligned}$$
(13)

Then we obtain

$$\begin{aligned} \left\{ \begin{array}{l} {\left[ {\textbf{Q}} \right] _{(i + 1)(j + 1)}} = {\lambda ^{i + j}}{G_\sigma }\left( {{e_i} - {e_j}} \right) ,i,j = \mathrm{{0}},\mathrm{{1}}, \ldots ,L-1\\ {\left[ {\textbf{P}} \right] _{(i + 1)(j + 1)}} = \left\{ { \begin{array}{l} \sum \limits _{k = 0}^{L - 1} {{\lambda ^{i + k}}{G_\sigma }\left( {{e_i} - {e_k}} \right) } ,i = j\\ 0,\mathrm{{ }}i \ne j \end{array}} \right. \\ {\left[ \varphi \right] _i} = {\lambda ^i}{G_\sigma }\left( {{e_i} - {e_0}} \right) ,i = \mathrm{{1}}, \ldots ,L - \mathrm{{1}}\\ {\phi _\mathrm{{0}}} = \sum \limits _{k = \mathrm{{1}}}^{L - 1} {{\lambda ^k}{G_\sigma }\left[ {{e_0} - {e_k}} \right] } . \end{array} \right. \end{aligned}$$
(14)

Through expressions (1214), we can obtain the \({\Phi _L}\) (see the formula (11) in [26] for details) of the objective function under the MEE criterion and its recursive method.

3 Proposed Minimum Error Entropy Conjugate Gradient

Through the derivations in Sect. 2, we have

$$\begin{aligned} \left\{ \begin{array}{l} {{\textbf{R}}_L} = {{\textbf{U}}_L}{\Phi _L}{\textbf{U}}_L^T,\\ {{\textbf{r}}_L} = {{\textbf{U}}_L}{\Phi _L}{{\textbf{D}}_L}. \end{array} \right. \end{aligned}$$
(15)

From (12), (14) and (15), we have

$$\begin{aligned} \left\{ \begin{array}{l} {{\textbf{R}}_L} = {\lambda ^2}{{\textbf{U}}_{L - 1}}{\Phi _{L - 1}}{\textbf{U}}_{L - 1}^T \\ \qquad \quad + \left[ {{{\textbf{u}}_0}{\phi _0}{\textbf{u}}_0^T + {{\textbf{U}}_{L - 1}}\varphi {\textbf{u}}_0^T + {{\textbf{u}}_0}{\varphi ^T}{\textbf{U}}_{L - 1}^T} \right] ,\\ {{\textbf{r}}_L} = {\lambda ^2}{{\textbf{U}}_{L - 1}}{\Phi _{L - 1}}{\textbf{D}}_{L - 1}^T \\ \qquad \quad + \left[ {{{\textbf{u}}_0}{\phi _0}d_0^T + {{\textbf{U}}_{L - 1}}\varphi d_0^T + {{\textbf{u}}_0}{\varphi ^T}{\textbf{D}}_{L - 1}^T} \right] . \end{array} \right. \end{aligned}$$
(16)

After applying the attenuation window to the correlation function for the data matrix in the CG, we achieve the same expressions as the RLS-type algorithm [1, 23]

$$\begin{aligned} \left\{ \begin{array}{c} {{\textbf{R}}_L} = \lambda _{}^{\mathrm{{2}}}{{\textbf{R}}_{L - 1}} + {{\textbf{u}}_0}{\phi _0}{\textbf{u}}_0^T,\\ {{\textbf{r}}_L} = \lambda _{}^{\mathrm{{2}}}{{\textbf{r}}_{L - 1}} + {{\textbf{u}}_0}{\phi _0}d_0^{}. \end{array} \right. \end{aligned}$$
(17)

The online CG method aims to minimize the following cost function [7]

$$\begin{aligned} \min F\left( {\textbf{w}}_L^{}\right) = \min \left( \frac{1}{2}{\textbf{w}}_L^T{\textbf{R}}_L^{}{\textbf{w}}_L^{} - {\textbf{r}}_L^T{\textbf{w}}_L^{}\right) , \end{aligned}$$
(18)

and the solution of (18) is

$$\begin{aligned} {\textbf{R}}_L^{}{\textbf{w}}_L^{} = {\textbf{r}}_L^{}. \end{aligned}$$
(19)

The weight vector \({\textbf{w}}_L^{}\) and the direction vector \(\textbf{p}_L\) can be updated as

$$\begin{aligned} {\textbf{w}}_L^{} = {\textbf{w}}_{L - 1}^{} + \alpha _L\textbf{p}_L^{}, \end{aligned}$$
(20)
$$\begin{aligned} \textbf{p}_{L+ 1} = \textbf{g}_L + {\beta _L}{} \textbf{p}_L, \end{aligned}$$
(21)

where step factor \(\alpha _L^{}\) and direction factor \({\beta _L}\) will update for each iteration and \(\textbf{g}_L\) denotes residual vector.

Substituting (17) into (20) and (21), the residual vector \(\textbf{g}_L\) for the online CG method is

$$\begin{aligned} \textbf{g}_L =&\;{\textbf{r}}_L^{} - {\textbf{R}}_L^{}{\textbf{w}}_L^{}\nonumber \\ =&\; \lambda ^{\mathrm{{2}}} \textbf{g}_{L - 1} - \alpha _L{\textbf{R}}_L\textbf{p}_L + {\textbf{u}}_0{\phi _0}(d_0 - {\textbf{u}}_0^H{\textbf{w}}_{L - 1}). \end{aligned}$$
(22)

According to the conjugate properties for \(\textbf{p}_L\) of the CG, we can get

$$\begin{aligned} \textbf{p}_{L + 1}^T{{\textbf{R}}_L}{} \textbf{p}_L =&\; (\textbf{g}_L + {\beta _L}{} \textbf{p}_L)_{}^T{{\textbf{R}}_L}{} \textbf{p}_L\nonumber \\ =&\; \textbf{g}_L^T{{\textbf{R}}_L}{} \textbf{p}_L + {\beta _L}{} \textbf{p}_L^T{{\textbf{R}}_L}{} \textbf{p}_L\nonumber \\ =&\; 0, \end{aligned}$$
(23)

and then the expression of \({\beta _L}\) can be derived as [1]

$$\begin{aligned} {\beta _L} = - \frac{{\textbf{g}_L^T{{\textbf{R}}_L}{} \textbf{p}_L}}{{\textbf{p}_L^T{{\textbf{R}}_L}{} \textbf{p}_L}}. \end{aligned}$$
(24)

According to the definition of the step factor \(\alpha _L^{}\) in (20), substituting (20) into the objective function \(F({\textbf{w}}_L^{})\) in (18) yields

$$\begin{aligned} F({\textbf{w}}_L^{}) =&\; F({\textbf{w}}_{L - 1}^{} + \alpha _L \textbf{p}_L). \end{aligned}$$
(25)

Setting the derivative of (18) to zero, \(\alpha _L^{}\) in (25) becomes

$$\begin{aligned} \alpha _L = \frac{{{\textbf{r}}_L^T\textbf{p}_L - {\textbf{w}}_{L - 1}^T{\textbf{R}}_L \textbf{p}_L}}{{\textbf{p}_L^T{\textbf{R}}_L \textbf{p}_L}}. \end{aligned}$$
(26)

According to the derivations above, the detailed description for the MEE-CG algorithm is given in Algorithm 1.

Algorithm 1
figure a

Proposed MEE-CG algorithm.

4 Performance Analyses

4.1 Mean Value Behavior

Assuming the input signal to be ergodic and wide-sense stationary, \(\alpha \left( n \right) \), \(\beta \left( n \right) \), \(p\left( n \right) \), \({\textbf{w}}\left( n \right) \), \(\textbf{g}\left( n \right) ,{\textbf{r}}\left( n \right) \) and \({\textbf{R}}\left( n \right) \) are independent with each other, and then we have \(E\left[ {\alpha \left( n \right) } \right] = {\bar{\alpha }} \), \(E\left[ {\beta \left( n \right) } \right] = {\bar{\beta }} \), \(E\left[ {{\textbf{r}}\left( n \right) } \right] = r\), and \(E\left[ {{\textbf{R}}\left( n \right) } \right] = R\) for simplicity. After applying Z-transform to the proposed MEE-CG, to ensure the stability of the described system, we obtain the same conclusion about the range of \(\alpha _L^{}\) and \({\bar{\beta }} \) in [1]

$$\begin{aligned} \left\{ \begin{array}{l} - 1 \le {\bar{\beta }} + {\bar{\alpha }} {\lambda _i} - {\bar{\beta }} - 1,\\ - 1 \le {\bar{\beta }} - {\bar{\alpha }} {\lambda _i} + {\bar{\beta }} + 1,\\ - 1 \le {\bar{\beta }} \le 1. \end{array} \right. \Rightarrow 0 \le {\bar{\alpha }} \le \frac{{2{\bar{\beta }} + 2}}{{{\lambda _{\max }}}}. \end{aligned}$$
(27)

where \(\lambda _i\) is the ith eigenvalue of R. When \({\bar{\beta }} \rightarrow 0\), we have \(0 \le {\bar{\alpha }} \le \mathrm{{2}}\lambda _{\max }^{\mathrm{{ - 1}}}\), which also supports the steepest descent algorithm in [9].

Besides, after M iterations, we have \(\tilde{\textbf{R}}(n)\tilde{\textbf{w}}(n) \approx \tilde{\textbf{r}}(n)\), where \( {\tilde{\textbf{x}}}\) represents the estimation for \(\textbf{x}\). Hence, the norm of \( \textbf{g}(M)\) satisfies

$$\begin{aligned} \parallel \textbf{g}(M) \parallel =\parallel {\tilde{\textbf{r}}(n) - \tilde{\textbf{R}}(n)\tilde{\textbf{w}}({M})} \parallel < \varepsilon , \end{aligned}$$
(28)

where \(\varepsilon \) can be an arbitrary small value. According to [1], we get the bound for the norm of \(\parallel \textbf{w}_o -\textbf{w}(n) \parallel \) as follows

$$\begin{aligned} \parallel \textbf{w}_o -\textbf{w}(n) \parallel _{\textbf{R}_1 }\le 2 \parallel \textbf{w}_o -\textbf{w}(0) \parallel _{\textbf{R}_1 }\left( \frac{\sqrt{\kappa }-1}{\sqrt{\kappa }+1}\right) ^{n} \end{aligned}$$
(29)

where \(\textbf{R}_1 = \tilde{\textbf{R}}(n)\) is positive define, \(\Vert \textbf{w} \Vert _{{\textbf{R}}_1} = \sqrt{{{\textbf{w}}^T}{{\textbf{R}}_1}{\textbf{w}}},\) and the condition number is defined as \(\kappa = {\left\| {\textbf{R}} \right\| _2}{\left\| {{{\textbf{R}}^{ - 1}}} \right\| _2}.\) After taking the expectations on both side of (29), we can reach the conclusion that MEE-CG is convergent in mean value.

4.2 Mean Square Behavior

After reaching the steady state, we have \({{\textbf{w}}(n)} \approx {{\textbf{w}}(n - 1)}\). From (20), we have \(\alpha (n) \textbf{p}(n) \approx \textbf{0}\). Then multiplying \(\alpha (n)\) with (26), we can have \(\alpha (n)^{} \approx 0\), and thus \(\textbf{g}(n) \approx \textbf{g}(n-1)\). According to (24), \({\beta (n)} \approx 0\), we can obtain \(\textbf{p}(n+1) \approx \textbf{g}(n)\) from (21). Taking this equation into (26), one can obtain \(\textbf{g}(n) \approx \textbf{0}\). Since \(\textbf{g}(n) = {\textbf{r}}(n) - {\textbf{R}}(n)^{}{\textbf{w}}(n) \approx {\textbf{r}}(n)^{} - {\textbf{R}}(n){\textbf{w}}(n-1)\), we can easily get \({\textbf{w}}(n) \approx {\textbf{R}}(n)^{ - 1}{\textbf{r}}(n)\), which is the same as the MEE-RLS algorithm. Therefore, it can be concluded that the steady state behavior for both CG and RLS algorithms are equivalent [7]. Then, according to [23], the total weighted error in steady state for the proposed MEE-CG is the same as the RMEE algorithm

$$\begin{aligned} \mathop {\lim }\limits _{n \rightarrow \infty } E\left\| {{\varvec{{\tilde{w}}}}(n)} \right\| _2^2 \approx \frac{{\mathrm{{1}} - {\lambda ^2}}}{{\mathrm{{1 + }}{\lambda ^2}}}M\sigma _u^{ - \mathrm{{2}}}E\left\{ {{{\textbf{v}}^2}(n)\phi (n)^2} \right\} {E^{ - \mathrm{{2}}}}\left\{ {{\phi (n)}} \right\} . \end{aligned}$$
(30)

The following numerical simulations also verify the correctness of the theoretical values.

4.3 Computational Complexity

Table 1 Computational complexities of the MSE-RLS,MSE-CG, MCC-RLS, MCC-CG, MEE-RLS and MEE-CG per iteration

Table 1 displays the computational complexities of the MSE-RLS, MSE-CG, MCC-RLS, MCC-CG, MEE-RLS and MEE-CG per iteration. Table 1 shows that the computational complexity of the MSE-CG, MCC-CG, and MEE-CG is lower than that of the MSE-RLS, MCC-RLS, and MEE-RLS, respectively.

Sections 4.2 and 4.3 show that the proposed MEE-CG algorithm are capable of achieving the same steady error as the MEE-RLS with lower burden in computation than the MEE-RLS.

5 Experimental Results

In this section, the simulations will be carried out to prove the theoretical analysis and verify the superiority of the proposed MEE-CG algorithm. We gain all the simulations from the average of independent 100 Monte Carlo trials and the performances of the algorithms are evaluated by the steady-state mean-square deviation (MSD)

$$\begin{aligned} \text {MSD} = \mathop {\lim }\limits _{n \rightarrow \infty } E\left\{ {\left\| {{{\textbf{w}}_o} - {\textbf{w}}(n)} \right\| _2^2} \right\} . \end{aligned}$$
(31)
Fig. 2
figure 2

Transient MSDs (dB) of the MCC-CG for different parameters values of \(\lambda ^2\) when \(\sigma \) = 1

Fig. 3
figure 3

Transient MSDs (dB) of the MEE-CG algorithm for different parameters values of \(\lambda ^2\) when \(\sigma \) = 1

In the simulations, the unknown parameter \({\textbf {w}}_{o}\) is chosen to be a vector with the size \(12\times 1\), and we set the sliding window in the length of L=15. We choose the white Gaussian random sequences as inputs \({\textbf {u}}\) where the covariance matrix \(E\{{\textbf {uu}}^{T}\}\)=\({\textbf {I}}_{5}\) and \(E\{{\textbf {u}}^{T}{} {\textbf {u}}\}\)=5. The additive impulsive noise is \(v(n) \sim 0.94N(0,0.01) + 0.06N(0,225).\) Figure 1 compares the proposed MEE-CG under \(\sigma \)=1. The \(\lambda ^{2}\) parameters for MEE-CG1, MEE-CG2, MEE-CG3, MEE-CG4 and MEE-CG5 denote 0.942, 0.965, 0.980, 0.990 and 0.995 respectively. The corresponding theoretical values (30) are TH-MEE-CG1, TH-MEE-CG2, TH-MEE-CG3, TH-MEE-CG4 and TH-MEE-CG5 respectively.

Fig. 4
figure 4

Transient MSDs (dB) of MEE-CG and other MEE-based algorithms

Figure 2 shows that the convergence rate of the MCC-CG algorithm of these five sets of parameters has proportional to the value of \(\lambda \), and their steady-state errors are inversely proportional to the value of \(\lambda \). The theoretical values for steady-state error will be increased under the decrease of the forgetting factor \(\lambda \) as well as the increase of the bandwidth \(\sigma \).

Figure 3 shows that the convergence rate of the MEE-CG algorithm of these five sets of parameters has proportional to the value of \(\lambda \), and their steady-state errors are inversely proportional to the value of \(\lambda \). The theoretical values for steady-state error will be increased when decreasing the forgetting factor \(\lambda \) or increasing the bandwidth \(\sigma \). And, the convergence rates of the MCC-CG and MEE-CG can be found from Figs. 2 and 3, respectively. One can see that the MEE-CG1 reaches the same MSD for about 800 iterations than the MCC-CG1 and the MEE-CG4 is 400 iterations faster than the MCC-CG4 in Figs. 2 and 3. Thus, the proposed MEE-CG algorithm achieves faster convergence speed than the MCC-CG. When the kernel width sets within an appropriate range through the experimental results and related researches, the MEE-type algorithms can have good filtering performance and the kernel width can be chosen by cross-validation or trial and error methods in practical applications [2, 4, 14, 26]. In this paper, \(\sigma = 1\) is a good choice to ensure the performance of the proposed algorithm by trial and error methods.

Then, the comparison between MEE-LMS1 \(\left( {\mu = 0.2} \right) \), MEE-LMS2 \(\left( {\mu = 0.4} \right) \), MEE-RLS1 [23] \(\left( {{\lambda ^2} = 0.990} \right) \), MEE-RLS2 [23] \(\left( {{\lambda ^2} = 0.965} \right) \), MEE-CG1 \(\left( {{\lambda ^2} = 0.990} \right) \) and MEE-CG2 \(\left( {{\lambda ^2} = 0.965} \right) \) are given in Fig. 4. The proposed MEE-CG has a smaller MSD value and a faster convergence rate than MEE-LMS algorithm. Under the same \({\lambda ^2}\) values, it has the same MSD as MEE-RLS, and the convergence rate of MEE-CG becomes slightly slower than the MEE-RLS.

6 Conclusion

We have proposed a new MEE-CG method based on MEE criterion, which has the same theoretical steady state error as that of the MEE-RLS. The numerical simulations reveal that when coping with non-Gaussian noise, the MEE-CG has faster convergence speed than MEE-LMS algorithm and smaller steady state error than MCC-CG algorithm. In addition, the MEE-CG has comparable performance with the MEE-RLS, and its computational complexity is lower than the MEE-RLS.