Introduction

With the development of power generation technology and energy storage technology, the electric vehicles (EVs) have become a new direction for the development of the automotive industry. Lithium batteries are ideal power sources of EVs due to their excellent performance [1, 2]. An essential link between batteries and EVs is the battery management system (BMS) [3, 4]. However, the most important parameter in BMS, the State of Charge (SOC), cannot be obtained by direct measurement [5]. Therefore, accurately estimating the SOC of lithium batteries has become a very important issue.

The SOC estimation methods

A number of SOC estimation methods have been proposed by researchers, which can be divided into the following categories.

  1. 1)

    The traditional methods

The traditional methods contain the Coulomb counting (CC) method [6] and the open-circuit voltage (OCV) method [7]. The CC method obtains the SOC by integrating the battery current over time during charging or discharging. However, its estimation performance depends on whether there is an exact initial SOC and a highly accurate sensor. If not, cumulative errors may occur, which will affect the final estimation result [7]. The OCV method obtains the SOC by looking up the SOC-OCV curve. But it requires a long time to keep the battery in an open-circuit state before measuring its OCV. This makes the method unable to meet the requirements of real-time estimation [8]. Furthermore, if the SOC-OCV curve obtained in advance is not accurate enough, the estimation performance of the OCV method may be counterproductive.

  1. 2)

    The Kalman filter-based methods

The Kalman filter-based methods estimate the battery SOC by combining an electrical circuit model (ECM) [9, 10] with an appropriate Kalman filter (KF), e.g., the extended Kalman filter (EKF) [11,12,13], the unscented Kalman filter (UKF) [14], the dual Kalman filter (DKF) [15], and the cubature Kalman filter (CKF) [16, 17], as well as other filters [18,19,20]. These strategies typically require accurate battery models to perform SOC estimation at varying conditions. Practically, there are a large number of parameters which are difficult to be identified in the battery model. In addition, only when the noise is zero mean (also called the Gaussian noise), the estimation results of the Kalman filter-based methods are accurate [21]. However, in a wide variety of applications, the noise is non-Gaussian.

  1. 3)

    The neural network-based methods

The neural network is an efficient recognition algorithm widely used in fault location [22, 23], classification [24], pattern recognition [25], and other fields. When applied to SOC estimation, the neural network-based methods can infer the relationship between the battery current, voltage, other variables, and SOC according to a large amount of training data [26, 27]. These methods realize the optimization of weights and biases through iterative training. The commonly used optimization algorithm is the gradient descent (GD) algorithm [28, 29]. However, during the training process based on the GD algorithm, when the weights of some layers in the neural network change significantly, the weights of other layers usually remain unchanged and only change during further iterations. In other words, the GD algorithm has inherent instability. Therefore, it will take a long time to update all the weights and biases of the network to an ideal state.

The extreme learning machine

The extreme learning machine (ELM) was proposed by Huang et al. in 2004 [30]. It is a special single-hidden layer feedforward neural network (SLFN). In an ELM, the input weights and hidden layer biases are set randomly and remain unchanged during the training process. The output weights are calculated according to the batch data least square algorithm [31]. Compared with the GD algorithm-based neural network, the training steps of the ELM are simpler, which makes its training speed faster. However, a potential disadvantage of the ELM is that the improvement of its learning accuracy is guaranteed by increasing the number of neurons in the hidden layer [32]. When dealing with complex problems, the structure of the ELM is often too large due to numerous neurons; then, over-fitting may occur, which will reduce the generalization ability of the algorithm [33].

Huang et al. introduced the regularization term into the objective function of the standard ELM and then got the regularized extreme learning machine (RELM) [34]. Correspondingly, during the training of the RELM by using the least square algorithm, the regularization coefficient is included in the Moore-Penrose generalized inverse matrix of the hidden output. In this way, both the empirical risk and the structural risk of the model can be reduced. Therefore, the generalization ability of the model can be improved, and the over-fitting can be effectively prevented.

With its excellent learning performance, many RELM-based methods have been proposed and have been widely used in various fields. For example, an online sequential regularized extreme learning machine (OS-RELM) was proposed by Cosmo et al. for single image super resolution and achieved comparable or better results than some state-of-the-art methods [35]. Gumaei et al. used a hybrid feature extraction method of a RELM for brain tumor classification with superior classification results of brain images [36]. Weng et al. applied a BIC criterion and genetic algorithm-based RELM in iron ore price forecasting [37].

For the ELM, its inverse matrix, which has a high computational complexity, needs to be calculated during the training process of the standard ELM and its variants [38]. Furthermore, a large number of intermediate matrices need to be stored in the process of calculating the inverse matrix, which leads to a large occupation of computer memory.

Contributions

In this paper, a conjugate gradient optimized regularized extreme learning machine (CG-RELM) is proposed to estimate battery SOC. The main contributions of this study reflect in the following aspects:

  1. 1)

    For a lithium battery, a CG-RELM model is established for SOC estimation with the measured voltage and current as the inputs, and the estimated SOC as the output of the model.

  2. 2)

    In the CG-RELM, the conjugate gradient (CG) algorithm is used to calculate the output weights. This not only avoids calculating the inverse matrix but guarantees the high convergence speed of the algorithm.

  3. 3)

    In simulation, the dynamic stress test (DST) data set is used to train the CG-RELM, and the simulation results verify that the proposed algorithm can effectively estimate the battery SOC.

This paper is set as follows. Section 2 illustrates the principle of the RELM. Section 3 introduces the principle of the CG algorithm. Section 4 investigates the CG algorithm optimized RELM for battery SOC estimation. Section 5 constructs the experimental platform, performs simulation, and analyzes the results of the CG-RELM model. The conclusion is given in Section 6.

The regularized extreme learning machine

The regularized extreme learning machine (RELM) is a kind of feedforward neural network with a single-hidden layer.

Notations:

  • M, L, and N indicate the number of neurons in the input layer, the hidden layer, and the output layer, respectively.

  • X and Y indicate the input matrix and the target output matrix of the network, respectively.

  • O indicates the output matrix of the network.

  • σ(·) indicates the Sigmoid activation function of the hidden layer.

  • W1, b, and W2 indicate the input weight matrix, the bias vector, and the output weight matrix of the network.

Based on the above notation explanations, X and Y can be expressed as follows:

$$ \boldsymbol{X}=\left[\begin{array}{cccc}{x}_1^{(1)}& {x}_2^{(1)}& \cdots & {x}_M^{(1)}\\ {}{x}_1^{(2)}& {x}_2^{(2)}& \cdots & {x}_M^{(2)}\\ {}& & \kern1em \vdots & \\ {}{x}_1^{(S)}& {x}_2^{(S)}& \cdots & {x}_M^{(S)}\end{array}\right]={\left[\begin{array}{c}{X}^{(1)}\\ {}{X}^{(2)}\\ {}\vdots \\ {}{X}^{(S)}\end{array}\right]}_{S\times M} $$
$$ \boldsymbol{Y}=\left[\begin{array}{cccc}{y}_1^{(1)}& {y}_2^{(1)}& \cdots & {y}_N^{(1)}\\ {}{y}_1^{(2)}& {y}_2^{(2)}& \cdots & {y}_N^{(2)}\\ {}& & \kern1.50em \vdots & \\ {}{y}_1^{(S)}& {y}_2^{(S)}& \cdots & {y}_N^{(S)}\end{array}\right]={\left[\begin{array}{c}{Y}^{(1)}\\ {}{Y}^{(2)}\\ {}\vdots \\ {}{Y}^{(S)}\end{array}\right]}_{S\times N} $$

where S indicates the sample size.

W1, b, and W2 can be expressed as follows:

$$ {\displaystyle \begin{array}{c}{\boldsymbol{W}}^1=\left[\begin{array}{c}{w}_{11}^1\kern0.5em {w}_{12}^1\kern0.5em \begin{array}{cc}\cdots & {w}_{1L}^1\end{array}\\ {}\begin{array}{ccc}{w}_{21}^1& {w}_{22}^1& \begin{array}{cc}\cdots & {w}_{2L}^1\end{array}\end{array}\\ {}\begin{array}{c}\begin{array}{ccc}& & \begin{array}{cc}\vdots & \end{array}\end{array}\\ {}\begin{array}{ccc}{w}_{M1}^1& {w}_{M2}^1& \begin{array}{cc}\cdots & {w}_{ML}^1\end{array}\end{array}\end{array}\end{array}\right]={\left[{W}_1^1{W}_2^1\cdots {W}_L^1\right]}_{M\times L}\\ {}\begin{array}{c}\boldsymbol{b}={\left[{b}_1{b}_2\cdots {b}_L\right]}_{L\times 1}^{\mathrm{T}}\\ {}{\boldsymbol{W}}^2=\left[\begin{array}{c}\begin{array}{ccc}{w}_{11}^2& {w}_{12}^2& \begin{array}{cc}\cdots & {w}_{1N}^2\end{array}\end{array}\\ {}\begin{array}{ccc}{w}_{21}^2& {w}_{22}^2& \begin{array}{cc}\cdots & {w}_{2N}^2\end{array}\end{array}\\ {}\begin{array}{c}\begin{array}{ccc}& & \begin{array}{cc}\vdots & \end{array}\end{array}\\ {}\begin{array}{ccc}{w}_{L1}^2& {w}_{L2}^2& \begin{array}{cc}\cdots & {w}_{LN}^2\end{array}\end{array}\end{array}\end{array}\right]={\left[{W}_1^1{W}_2^1\cdots {W}_L^1\right]}_{L\times N}\end{array}\end{array}} $$
$$ \boldsymbol{b}={\left[{b}_1\kern0.5em {b}_2\kern0.5em \cdots \kern0.5em {b}_L\right]}_{L\times 1}^{\mathrm{T}} $$
$$ {\boldsymbol{W}}^2=\left[\begin{array}{c}{w}_{11}^2\kern1em {w}_{12}^2\kern1em \cdots \kern1em {w}_{1N}^2\\ {}{w}_{21}^2\kern1em {w}_{22}^2\kern1em \cdots \kern1em {w}_{2N}^2\\ {}\vdots \\ {}\begin{array}{c}{w}_{L1}^2\kern0.9em {w}_{L2}^2\kern0.9em \cdots \kern0.9em {w}_{LN}^2\end{array}\end{array}\right]={\left[{W}_1^2\kern1em {W}_2^2\kern0.5em \cdots \kern0.5em {W}_L^2\right]}_{L\times N} $$

Then the output matrix of the hidden layer can be calculated as:

$$ \boldsymbol{H}={\left[\begin{array}{ccc}\sigma \left({X}^{(1)}\cdot {W}_1^1+{b}_1\right)& \cdots & \sigma \left({X}^{(1)}\cdot {W}_L^1+{b}_L\right)\\ {}\vdots & \ddots & \vdots \\ {}\sigma \left({X}^{(S)}\cdot {W}_1^1+{b}_1\right)& \cdots & \sigma \left({X}^{(S)}\cdot {W}_L^1+{b}_L\right)\end{array}\right]}_{S\times L} $$
(1)

Finally, the output matrix of the network can be obtained:

$$ \boldsymbol{O}={\boldsymbol{HW}}^2. $$
(2)

According to the RELM algorithm, the input weight matrix (W1) and bias vector (b) are set randomly and stay unchanged during the training process. Therefore, the task of the RELM is to find an estimate W2, ∗ that can minimize the error between the output (O) and the target output (Y) of the network, which can be expressed as:

$$ {\boldsymbol{W}}^{2,\ast }=\underset{{\boldsymbol{W}}^2}{\min}\left\Vert \boldsymbol{O}-\boldsymbol{Y}\right\Vert =\underset{{\boldsymbol{W}}^2}{\min}\left\Vert {\boldsymbol{HW}}^2-\boldsymbol{Y}\right\Vert . $$
(3)

The traditional ELM algorithm calculates W2, ∗ by using the batch data least square algorithm, as shown below:

$$ {\boldsymbol{W}}^{2,\ast }={\boldsymbol{H}}^{+}\boldsymbol{Y}={\left({\boldsymbol{H}}^{\mathrm{T}}\boldsymbol{H} \right)}^{-1}{\boldsymbol{H}}^{\mathrm{T}}\boldsymbol{Y}. $$
(4)

where Η+ = (ΗTΗ)−1ΗT indicates the generalized Moore-Penrose inverse matrix of H.

However, when dealing with complex problems, in order to meet the requirement of accuracy, the hidden layer neurons of the ELM are usually set too much. This may lead to over-fitting, thereby reducing the generalization ability of the network. Moreover, when H is not a column full-rank matrix, the result of the determinant of HTH is 0. Then, an error may occur when calculating W2, ∗ according to (4).

To solve the above problems, the RELM adds a positive constant λ(called the regularization coefficient) to each element on the diagonal of HTH. Then,W2, ∗ can be calculated as follows:

$$ {\boldsymbol{W}}^{2,\ast }={\left({\boldsymbol{H}}^{\mathrm{T}}\boldsymbol{H} +\lambda \boldsymbol{I} \right)}^{-1}{\boldsymbol{H}}^{\mathrm{T}}\boldsymbol{Y}. $$
(5)

where I is an identity matrix.

The conjugate gradient algorithm

The related work

Definition 3.1

An n × n matrix A is positive definite if zTAz > 0 for every n-dimensional column vector z≠0, where zT is the transpose of vector z.

Definition 3.2

An n × n matrix A is positive semi-definite if zTAz ≥ 0 for every n-dimensional column vector z≠0, where zT is the transpose of vector z.

Definition 3.3

Two vectors, r1 and r2, are conjugate with respect to A (called A-conjugate) if \( {\boldsymbol{r}}_1^{\mathrm{T}}{\boldsymbol{Ar}}_2=0 \), where A is an n × n symmetric positive definite matrix and r1, r2  ∈ ℝn.

Also, r1, r2, ⋯, rk ∈ ℝn are a set of conjugate directions of A if \( {\boldsymbol{r}}_i^T{\boldsymbol{Ar}}_j \) = 0, ij, i, j = 1,2,…,k.

Remark 3.1

The notion of conjugacy is in some ways a generalization of orthogonality. When A is the identity matrix I, A-conjugate means A-orthogonal.

Theorem 3.1

Suppose there is a linear system Ax = B, where A is the coefficient matrix, x is the solution vector, and B is the constant vector. If A is a symmetric positive definite matrix, then, the solution of Ax = B is also the minimal point (denoted as x*) of the function f(x) shown below:

$$ f\left(\boldsymbol{x}\right)=\frac{1}{2}{\boldsymbol{x}}^{\mathrm{T}}\boldsymbol{Ax}-{\boldsymbol{B}}^{\mathrm{T}}\boldsymbol{x}. $$

Theorem 3.2

Suppose r1, r2, ⋯, rn ∈ ℝn are a set of A-conjugate vectors, where A is an n × n symmetric positive definite matrix. Take any x1 ∈ ℝn as the initial point, a series of new points can be obtained by performing one-dimensional search in the direction of r1, r2, ⋯, rn. The minimal point of the function f(x) in Theorem 3.1 can be obtained by iterating at most n times—namely, the conjugate direction algorithm.

According to Theorems 3.13.2, if there is a set of A-conjugate direction vectors r1, r2, ⋯, rn ∈ ℝn, the solution of Ax = B can be obtained by the conjugate direction algorithm.

The principle of the conjugate gradient algorithm

The conjugate gradient (CG) algorithm [39], a kind of conjugate direction algorithm, generates A-conjugate direction vectors by using the gradient vectors of f(x). Then, the minimal point of f(x) can be calculated, which is also the solution of the linear equation Ax = B.

Remark 3.2

The gradient of the function f(x) can be obtained by calculating the partial derivative of f(x) against x.

According to Theorem 3.1 and Remark 3.2, we have:

$$ {\boldsymbol{g}}_k={\left.\frac{\partial f}{\partial \boldsymbol{x}}\right|}_{\boldsymbol{x}={\boldsymbol{x}}_k}={\boldsymbol{Ax}}_k-\boldsymbol{B}, $$
(6)
$$ {\boldsymbol{Ax}}^{\ast }-\boldsymbol{B}=0. $$
(7)

According to the CG algorithm, the direction vector at the kth iteration (rk) is obtained from that at the previous iteration (rk-1) and the gradient vector at the current iteration (gk), as shown below:

$$ {\boldsymbol{r}}_k=-{\boldsymbol{g}}_k+{\beta}_k{\boldsymbol{r}}_{k-1}. $$
(8)

where βk is the correction coefficient.

According to Definition 3.3 and (8), βk in the CG algorithm need meet:

$$ {\boldsymbol{r}}_k^{\mathrm{T}}{\boldsymbol{Ar}}_{k-1}=-{\boldsymbol{g}}_k^{\mathrm{T}}{\boldsymbol{Ar}}_{k-1}+{\beta}_k{\boldsymbol{r}}_{k-1}^{\mathrm{T}}{\boldsymbol{Ar}}_{k-1}=0. $$
(9)

Then, we have the calculation equation of βk:

$$ {\beta}_k=\frac{{\boldsymbol{g}}_k^{\mathrm{T}}{\boldsymbol{Ar}}_{k-1}}{{\boldsymbol{r}}_{k-1}^{\mathrm{T}}{\boldsymbol{Ar}}_{k-1}}. $$
(10)

Definition 3.4

Define the learning rate at the kth iteration as αk. Then, the parameter updating equation of the CG algorithm is:

$$ {\boldsymbol{x}}_{k+1}={\boldsymbol{x}}_k+{\alpha}_k{\boldsymbol{r}}_k. $$
(11)

Definition 3.5

Define ek = xxk as the error vector at the kth iteration, and rk-1 is the direction vector at the previous iteration. The accurate one-dimensional search requires that ek and rk-1 be conjugate with respect to A; that is, \( {\boldsymbol{r}}_{k-1}^{\mathrm{T}}{\boldsymbol{Ae}}_k=0 \).

Then, we can get:

$$ {\displaystyle \begin{array}{c}{r}_k^{\mathrm{T}}{Ae}_{k+1}={r}_k^{\mathrm{T}}A\left({x}^{\ast }-{x}_{k+1}\right)\\ {}={r}_k^{\mathrm{T}}A\left({x}^{\ast }-{x}_k+{x}_k-{x}_{k+1}\right)\\ {}={r}_k^{\mathrm{T}}A\left({e}_k-{\alpha}_k{r}_k\right)\\ {}={r}_k^{\mathrm{T}}{Ae}_k-{\alpha}_k{r}_k^{\mathrm{T}}{Ar}_k\\ {}=0.\end{array}} $$
(12)

Further, the calculation of αk can be obtained:

$$ {\alpha}_k=\frac{{\boldsymbol{r}}_k^{\mathrm{T}}{\boldsymbol{Ae}}_k}{{\boldsymbol{r}}_k^{\mathrm{T}}{\boldsymbol{Ar}}_k}. $$
(13)

Substituting (6) and (7) into (13), the simplified expression of αk can be obtained as follows:

$$ {\alpha}_k=\frac{{\boldsymbol{r}}_k^{\mathrm{T}}\boldsymbol{A}\left({\boldsymbol{x}}^{\ast }-{\boldsymbol{x}}_k\right)}{{\boldsymbol{r}}_k^{\mathrm{T}}{\boldsymbol{Ar}}_k}=\frac{{\boldsymbol{r}}_k^{\mathrm{T}}\left(\boldsymbol{B}-{\boldsymbol{Ax}}_k\right)}{{\boldsymbol{r}}_k^{\mathrm{T}}{\boldsymbol{Ar}}_k}=-\frac{{\boldsymbol{r}}_k^{\mathrm{T}}{\boldsymbol{g}}_k}{{\boldsymbol{r}}_k^{\mathrm{T}}{\boldsymbol{Ar}}_k}. $$
(14)

The expression of βk in (10) can also be simplified, and there are many equations for simplification, such as the Polak-Ribière-Polyak (PRP) equation [40], the Fletcher-Reeves (FR) equation [41], the conjugate descent (CD) equation [42], the Liu-Storey (LS) equation [43], and the Dai-Yuan (DY) equation [44].

In this paper, the FR equation is selected. The advantage of the FR equation is that only the gradient vectors are used when calculating βk, and the coefficient matrix and the direction vectors are not needed.

The derivation of the FR equation is as follows:

According to Definition 3.5 and (11), when i < j, the error vector at the jth iteration can be expressed as follows:

$$ {\displaystyle \begin{array}{l}{e}_j=\left({x}_{i+1}+{e}_{i+1}\right)-{x}_j\\ {}\kern1.25em ={e}_{i+1}-\left({x}_j-{x}_{i+1}\right)\\ {}\kern1.25em ={e}_{i+1}-\sum \limits_{k=i+1}^{j-1}{\alpha}_k{r}_k\cdotp \end{array}} $$
(15)

Then, according to (6), (7), and (15), we have:

$$ {\displaystyle \begin{array}{l}{r}_i^{\mathrm{T}}{g}_j={r}_i^{\mathrm{T}}\left({Ax}_j-B\right)\\ {}\kern2.5em ={r}_i^{\mathrm{T}}\left({Ax}_j-{Ax}^{\ast}\right)\\ {}\begin{array}{l}\kern3.75em =-{r}_i^{\mathrm{T}}{Ae}_j\\ {}=-{r}_i^{\mathrm{T}}A\left({e}_{i+1}-\sum \limits_{k=i+1}^{j-1}{\alpha}_k{r}_k\right)\cdotp \end{array}\end{array}} $$
(16)

According to Definition 3.3 and Definition 3.5, we have: \( {\boldsymbol{r}}_i^{\mathrm{T}}{\boldsymbol{Ae}}_{i+1}=0 \) and \( {\boldsymbol{r}}_i^{\mathrm{T}}{\boldsymbol{Ar}}_k=0,k=i+1,\cdots, j-1 \). Then, we get:

$$ {\boldsymbol{r}}_i^{\mathrm{T}}{\boldsymbol{g}}_j=0,i<j. $$
(17)

Substituting (11) into (6) gives:

$$ {\displaystyle \begin{array}{l}{g}_k=A\left({x}_{k-1}+{\alpha}_{k-1}{r}_{k-1}\right)-B\\ {}\kern1.5em =A{x}_{k-1}-B+{\alpha}_{k-1}A{r}_{k-1}\\ {}\kern1.5em ={g}_{k-1}+{\alpha}_{k-1}{Ar}_{k-1\cdotp}\end{array}} $$
(18)

Then, we have:

$$ {\boldsymbol{Ar}}_{k-1}=\frac{{\boldsymbol{g}}_k-{\boldsymbol{g}}_{k-1}}{\alpha_{k-1}}. $$
(19)

Substituting (19) into (10) gives:

$$ {\beta}_k=\frac{{\boldsymbol{g}}_k^{\mathrm{T}}{\boldsymbol{g}}_k-{\boldsymbol{g}}_k^{\mathrm{T}}{\boldsymbol{g}}_{k-1}}{{\boldsymbol{r}}_{k-1}^{\mathrm{T}}{\boldsymbol{g}}_k-{\boldsymbol{r}}_{k-1}^{\mathrm{T}}{\boldsymbol{g}}_{k-1}}. $$
(20)

According to (8) and (17), we have:

$$ {\displaystyle \begin{array}{c}{r}_{k-1}^{\mathrm{T}}{g}_k=\left(-{g}_{k-1}^{\mathrm{T}}+{\beta}_{k-1}{r}_{k-2}^{\mathrm{T}}\right){g}_k\\ {}=-{g}_{k-1}^{\mathrm{T}}{g}_k+{\beta}_{k-1}{r}_{k-2}^{\mathrm{T}}{g}_k\\ {}\begin{array}{c}=-{g}_{k-1}^{\mathrm{T}}{g}_k\\ {}=0.\end{array}\end{array}} $$
(21)

Then, we get:

$$ {\boldsymbol{g}}_{k-1}^{\mathrm{T}}{\boldsymbol{g}}_k=0. $$
(22)

Therefore, we have:

$$ {\displaystyle \begin{array}{c}{r}_{k-1}^{\mathrm{T}}{g}_{k-1}=\left(-{g}_{k-1}^{\mathrm{T}}+{\beta}_{k-1}{r}_{k-2}^{\mathrm{T}}\right){g}_{k-1}\\ {}=-{g}_{k-1}^{\mathrm{T}}{g}_{k-1}+{\beta}_{k-1}{r}_{k-2}^{\mathrm{T}}{g}_{k-1}\\ {}=-{g}_{k-1}^{\mathrm{T}}{g}_{k-1}\cdotp \end{array}} $$
(23)

Substituting (21)–(23) into (20), we obtain the FR equation:

$$ {\beta}_k=\frac{{\boldsymbol{g}}_k^{\mathrm{T}}{\boldsymbol{g}}_k}{{\boldsymbol{g}}_{k-1}^{\mathrm{T}}{\boldsymbol{g}}_{k-1}}=\frac{{\left\Vert {\boldsymbol{g}}_k\right\Vert}^2}{{\left\Vert {\boldsymbol{g}}_{k-1}\right\Vert}^2}. $$
(24)

Choose the opposite gradient direction as the initial direction. Then, the CG algorithm can generate a set of A-conjugate direction vectors r1, r2, ⋯, rn ∈ ℝn according to the following equation:

$$ {\boldsymbol{r}}_k=\left\{\begin{array}{cc}-{\boldsymbol{g}}_1& k=1,\\ {}-{\boldsymbol{g}}_k+\frac{{\parallel {\boldsymbol{g}}_k\parallel}^2}{{\parallel {\boldsymbol{g}}_{k-1}\parallel}^2}{\boldsymbol{r}}_{k-1}& k\ge 2.\end{array}\right. $$
(25)

The CG-RELM algorithm for SOC estimation

The conjugate gradient optimized regularized extreme learning machine (CG-RELM) is used for battery SOC estimation by taking the measured voltage Vi and current Ii (i=1,2,...,S) as inputs and the estimated SOCi as the output. Therefore, the number of neurons in the input layer and the output layer is 2 and 1, respectively, and we have:

$$ {\displaystyle \begin{array}{l}\boldsymbol{X}={\left[\begin{array}{l}{V}_1,\kern0.5em {V}_2,\kern0.5em \cdots, \kern0.5em {V}_s\\ {}{I}_1,\kern0.5em {I}_2,\kern0.5em \cdots, \kern0.5em {I}_s\end{array}\right]}^{\mathrm{T}},\\ {}\boldsymbol{Y}={\left[{SOC}_1,\kern0.5em {SOC}_2,\kern0.5em \cdots, \kern0.5em {SOC}_s\right]}^{\mathrm{T}}.\end{array}} $$

Let Ψ = HTH + λI and T = HTY; then, Eq. (5) can be transformed into:

$$ {\boldsymbol{\varPsi} \boldsymbol{W}}^{2,\ast }=\boldsymbol{T} . $$
(26)

According to Definitions 3.23.3, Ψ is a symmetric positive definite matrix. The linear equation ΨW2, ∗ = Τ is same as Ax = B. Thus, according to Theorems 3.13.2, its solution can copy the results of Ax = B by the CG algorithm as:

$$ {\boldsymbol{W}}^{2,\ast }={\boldsymbol{W}}_{k+1}^2={\boldsymbol{W}}_k^2+{\alpha}_k{\boldsymbol{r}}_k. $$
(27)

The training steps of the CG-RELM algorithm are shown in Fig. 1.

Fig. 1
figure 1

The training steps of the CG-RELM

Experiment and simulation

Data sampling and preprocessing

As shown in Fig. 2, there is a battery test platform which consists of a battery tester (NEWARE CT-4008-5V12A-TB), a battery holder, NCR18650PF (2900mAh) lithium batteries, and a computer. The battery indexes are indicated in Table 1. Perform discharge test on this battery test platform to sample data, and then, get the dynamic stress test (DST) data set, which is shown in Fig. 3.

Fig. 2
figure 2

The battery test platform

Table 1 Indexes of the NCR18650PF lithium battery
Fig. 3
figure 3

The voltage and current of the DST data set

First, we adopt the exponential moving average method to process the original voltage and current data sampled during the discharge process. The calculation formula is as follows:

$$ \left\{\begin{array}{cc}{y}_1^{\prime }={y}_1& k=1,\\ {}{y}_k^{\prime }=\eta \cdot {y}_{k-1}^{\prime }+\left(1-\eta \right)\cdot {y}_k& k\ge 2.\end{array}\right. $$
(28)

where \( {y}_k^{\prime } \) represents the processed data, yk represents the original data, and η is a constant coefficient.

Then, we normalize the battery data to (−1, 1) to reduce the network calculation load and improve the training speed.

Simulation

Take the mean squared error (MSE) as the cost function, and adopt the root mean squared error (RMSE), the mean absolute error (MAE), as well as the coefficient of determination (R2) to evaluate the estimation performance of the CG-RELM.

Randomly select 75% of all processed data as the training set, and the rest as the test set.

  1. (1)

    Set the number of hidden layer neurons (L) to 300, and then, respectively, apply the CG-RELM and BP neural network (BP-NN) to estimate SOC. The simulation results are shown in Fig. 4 and Table 2.

    Fig. 4
    figure 4

    The MSE of the CG-RELM and BP-NN during training

    Table 2 The performance of different models

It can be seen that the CG-RELM has faster convergence speed and better estimation performance when compared with the BP-NN.

  1. (2)

    Respectively apply the CG-RELM with different numbers of hidden layer neurons (L = 200 and L = 600) to estimate SOC. The simulation results are shown in Fig. 5 and Table 3.

Fig. 5
figure 5

The estimated SOCs and errors of the CG-RELM with different neuron numbers. a The estimated SOCs and errors when L = 200 b The estimated SOCs and errors when L = 600

Table 3 The performance of the CG-RELM with different neuron numbers

It can be seen from the results that the CG-RELM has better estimation performance when L is larger.

  1. (3)

    In order to test the robustness of the CG-RELM, we add appropriate amount of noise to the original data. The noise variance corresponding to the voltage data is 0.01 and the current data 0.03. Respectively apply the CG-RELM (L =500) with different data sets to estimate SOC. The simulation results are shown in Fig. 6 and Table 4.

Fig. 6
figure 6

The estimated SOCs and errors of the CG-RELM with different data sets. a The estimated SOCs and errors without noises b The estimated SOCs and errors with noises

Table 4 The performance of the CG-RELM with different data sets.

It can be seen from Fig. 6 and Table 4 that although the estimation error under the noise-added data has increased, the estimated SOCs are still within a satisfactory range.

Figures 4, 5, and 6 and Tables 2, 3, and 4 indicate that the CG-RELM has high accuracy and strong robustness.

Conclusion

A conjugate gradient optimized regularized extreme learning machine (CG-RELM) is investigated to estimate battery SOC. In this hybrid algorithm, the conjugate gradient (CG) algorithm is used to calculate the output weights of the regularized extreme learning machine (RELM), which is a special feedforward neural network (SLFN) with fixed input weights and hidden layer biases. The weights adjustment directions at different points calculated by the CG algorithm are conjugate orthogonal to each other, avoiding the calculation load caused by the inverse matrix and guaranteeing the high convergence speed of the CG-RELM.

The simulation results verify that the CG-RELM can effectively estimate the battery SOC. (i) The convergence speed of the CG-RELM is faster when compared with the BP neural network. (ii) Increasing the number of hidden layer neurons can improve estimation precision. (iii) The algorithm shows high robustness when applied to the noise-added data set. This investigated method can also be applied to block-oriented systems with NN non-linear parts [45,46,47].