1 Introduction

Lithium-ion batteries have the advantages of low cost, high energy density, and long cycle life. Thus, recent years have witnessed its wide application in aerospace, electric vehicles (EV), photovoltaic power grids, and other fields [1]. A battery management system (BMS) can realize the scientific assessment, risk warning, and regular replacement recommendations of batteries to ensure their healthy and stable operation. Accurate estimation of the state of health (SOH), as a key part of the BMS in [2], is desired to provide information for safety management and charging/discharging optimal control, which is defined as the ratio of available current capacity to factory rated capacity. Due to battery aging, the SOH of a battery takes on a decreasing trend. Generally, the end of life (EoL) of a battery is reached when the SOH has decreased to 80%, and it is necessary to replace the batteries before a battery failure makes the whole system crash [3].

Unfortunately, the SOH of a battery cannot be measured directly by a sensor. It can only be estimated by analyzing external measurements combined with a mathematical method. The SOH estimation methods for the lithium-ion battery can be divided into model-based methods and data-driven methods. Model-based methods mainly include equivalent circuit models [4], electrochemical models [5, 6] and empirical degradation models (EDM) [7,8,9,10,11]. Equivalent circuit models, with the merits of a simple structure and reduced calculations can simulate the external working state of the battery, which is combined with optimization and filtering algorithms for parameter identification and SOH estimation. However, the adaptability of equivalent circuit models is relatively poor, and they can be easily affected by working conditions and the convergence of the estimation algorithm. The electrochemical models, when combined with a series of partial differential equations, can describe the physical and chemical mechanism of battery degradation, as represented by a pseudo two-dimensional (P2D) model based on the porous electrode theory [5] and its simplified scheme [6]. However, both the parameter identification and the equation calculations are complex, which makes them unsuitable for the on-line estimation of BMS systems. The empirical degradation models [7,8,9,10,11] can describe the overall trend of the capacity degradation of a battery in all of its cycles. They possess the advantages of good practicability and strong robustness, which makes them suitable for on-line applications. However, as a simplification of historical data, empirical degradation models are difficult to adapt to the local differences of battery degradation, which are mainly caused by the individual differences among batteries and the local fluctuations of capacity degradation that are not smooth.

The data-driven based SOH estimation methods do not need to analyze the internal mechanisms of a battery. Instead, they extract and analyze the external health features (HF) that are closely related to battery degradation. Then, they establish a nonlinear mapping relationship between HF and SOH through a machine learning algorithm. This avoids the need for parameter identification and physical modeling. As a result, they have strong flexibility. The rationality of the health features selected and the generalization capability of the training algorithm have a significant impact on accuracy. The health features include the primary features of the charge and discharge curves of a battery [12], the secondary features after processing [13], etc. In addition, the training algorithms mainly include neural networks [12,13,14], support vector machines [15, 16], relevance vector machines [17], etc. Generally, data-driven methods are easy to establish and implement, and the external features can be extracted to efficiently realize online estimation after the battery aging regression model is established off-line. Thus, they have strong nonlinear mapping capability and can capture the feature details of capacity degradation to describe the actual situation of battery degradation. However, adequate training samples are required to fully learn and remember the capacity degradation situations. In addition, the divergence of estimation results can occur when the training samples are insufficient, which indicates poor robustness.

Therefore, a SOH estimation method that combines an empirical degradation model with a data-driven method deserves consideration, since it is expected to maintain accuracy and enhance the robustness. A least squares support vector machine (LSSVM) [18, 19] is an efficient machine learning algorithm with strong nonlinear mapping capability. It has wide application in the field of nonlinear regression. Thus, this paper proposes a lithium-ion battery SOH estimation method based on a least squares support vector machine error compensation model (LSSVM-ECM), which realizes the integration of an empirical degradation model and a data-driven method. Experimental results on battery data provided by NASA [20] and Oxford [21] datasets show that the proposed method has a high SOH estimation accuracy under different conditions. Meanwhile, it has the stronger robustness than purely data-driven methods and the better transplant ability than empirical degradation models.

2 Empirical degradation model

The relationship between battery capacity and SOH is shown in Eq. (1):

$$\text{SOH}(N) = \frac{Q(N)}{{Q_{N} }},$$
(1)

where QN is the initial capacity, Q(N) is the actual capacity under the cycle N. For the sake of simplicity, the degradation rate is considered as the function of Q and N:

$$\frac{dQ}{{dN}} = f(Q,N),$$
(2)

where f(·) is the nonlinear function with two variables. The first-order Taylor expansion of Eq. (2) is carried out to obtain Eq. (3):

$$\frac{dQ}{{dN}} = a_{0} + a_{1} Q + a_{2} N + o(\sqrt {Q^{2} + N^{2} } ),$$
(3)

a0, a1, and a2 are constant. o(·) refers to a high order remainder, which can be ignored since the capacity fluctuation is not considered here. See Eq. (4):

$$\frac{\text{dQ}}{{\text{dN}}} = a_{0} + a_{1} Q + a_{2} N.$$
(4)

Discretize (4) into (5) as follows:

$$Q(N + 1) - Q(N) = a_{0} + a_{1} Q(N) + a_{2} N.$$
(5)

Then arrange it into (6):

$$Q(N + 1) = (1 + a_{1} )Q(N) + a_{2} N + a_{0} .$$
(6)

Equation (6) is a first-order linear difference equation, whose general solution form is:

$$Q(N) = (1 + a_{1} )^{N} - \frac{{a_{2} }}{{a_{1} }}N - \frac{{a_{2} }}{{a_{1}^{2} }} - \frac{{a_{0} }}{{a_{1} }}$$
(7)

Let 1 + a1 = a, -a2/a1 = b, and -a2/a12-a0/a1 = c. Then, (7) can be transformed into Eq. (8):

$$Q(N) = a^{N} + bN + c.$$
(8)

Combining Eq. (1) and considering SOH(0) = 1, (8) can be transformed into (9):

$$\text{SOH}(N) = k_{1} k_{2}^{N} + k_{3} N - k_{1} + 1.$$
(9)

SOH (N) = Q(N)/QN, k1 = 1/QN, k2 = a, and k3 = b/QN, which are the parameters to be identified with historical data of battery degradation by the least square method. Fitting the historical data of the capacity degradation from the Oxford and NASA datasets for partial cycles and the whole cycles with (9), Figs. 1 and 2 can be obtained.

Fig. 1
figure 1

Fitting of partial cycles of an EDM

Fig. 2
figure 2

Fitting of all the cycles of an EDM

Here the extrapolation method is applied to the cycle numbers to investigate the capacity degradation. Notably, extrapolation can also be applied to the voltage, temperature, or other physical quantities to achieve state observation, which is suitable for many power electronic components such as MOSFETs [22,23,24], where the extrapolation models of these variables or their reciprocals are established for lifetime estimation. Voltage is an important variable for the sate estimation of a battery, as introduced in Sect. 3. As a critical factor, temperature significantly impacts the performance of batteries [25], which is also important in the field of batteries.

Table 1 shows parameter identification results for the whole cycle.

Table 1 Parameter identification results

It can be seen from Figs. 1 and 2 that an EDM can describe the overall trend of capacity degradation. However, the main problems are as follows.

  1. (1)

    Using an EDM established by partial cycles to predict the remaining cycles tends to produce large errors, as shown in Fig. 1. This is due to the fact that formula (9) ignores the high-order terms and only contains historical information. It does not contain real information, which makes it difficult to track the actual degradation trend of capacity.

  2. (2)

    Using an EDM established by all of the cycles of a basis battery to predict the capacity degradation of other batteries tends to produce large errors, even when they are the same type of battery, as shown in Fig. 2. This is due to the fact that for different batteries of the same type, the actual capacity degradation can be quite different due to battery inconsistencies caused by manufacturing. Therefore, the transplant ability of an EDM between different batteries is poor.

An EDM is a simplification of actual capacity degradation that includes historical information of battery aging, and can describe the overall trend of capacity degradation. However, the above shortcomings make it necessary to combine an EDM with a data-driven method that contains more realistic information on capacity degradation to enhance accuracy.

3 Data-driven method

The basic idea of a data-driven method for SOH estimation is to describe and map the actual capacity degradation through HF. Thus, the selection of health features and a training algorithm are two key factors that affect the performance and practicability of this method. The data-driven method based on DV_DT and LSSVM are introduced in this section.

3.1 HF extraction

In this paper, B0005–B0007 and B0029–B0031 from the NASA randomized battery usage data set [20] and Cell1–Cell8 from the Oxford battery degradation data set [21] are used for the experiment. The actual capacity of each battery can be acquired by ampere–hour integration. In other words, it can be acquired by calculating the integration of the discharging current with the discharging time for every charge–discharge cycle, which is then divided by the rated capacity to get the actual SOH.

The constant current (CC)–constant voltage (CV) mode [17] for charging is the most common. The voltage curves of the CC stage under different SOH values of a battery are shown in Fig. 4. In this figure, the curve color changes from bright to dark, which indicates that the battery aging is deepening and presents a strong relevance. In practice, the complete charging voltage curve is often difficult to obtain. Thus, the voltage segment can be selected instead. Considering that the charging time of the battery is difficult to determine, while the charging voltage is easy to measure, the time interval ΔT = T2 − T1 of equal charging voltage rising [U1,U2] is selected as a HF in this paper. This is denoted by DV_DT.

Figure 3 shows the charging voltages under 1st, 38th and 76th cycles. The charging voltages under the other cycles have the same trend. Thus, they are not shown here.

Fig. 3
figure 3

Charging voltage curves

The first battery of every pack is taken as the reference battery to determine the optimal voltage segment [U1,U2]. For any given segment [U1,U2], it is possible to obtain the corresponding time interval sequence {DV_DT(1),…, DV_DT(38),…,DV_DT(EOL)}. In addition, the SOH degradation sequence {SOH(1),…,SOH(38),…,SOH(EOL)} of the reference battery can be acquired by the ampere-hour integral method. The traversing method is used to search for the optimal voltage segment [U1,U2], whose corresponding DV_DT sequence has the highest Pearson coefficient value with respect to the SOH degradation sequence. This indicates that the DV_DT sequence has the highest relevance with respect to the SOH degradation sequence and is suitable as a health feature.

The formula of the Pearson coefficient is Eq. (10). Figure 4 shows the Pearson coefficient between ΔT and the SOH for each of the batteries under different [U1,U2]. It can be seen that the different voltage rising segments have a great impact on the correlation degree. The optimal voltage segment of the Oxford data set battery is [3.8,4.15], that of the B0005–B0007 data set is [3.95,4.15], and that of the B0029 ~ B0031 data set is [3.7,4], which are presented in the cursors in Fig. 4:

$${\text{Pearson}} = \frac{{\sum\nolimits_{i} {({\text{DV}}\_{\text{DT}}_{i} - {\text{DVT}})({\text{SOH}}_{i} - {\text{SOH}}_{e} )} }}{{\sqrt {\sum\nolimits_{i} {({\text{DV}}\_{\text{DT}}_{i} - {\text{DVT}})^{2} } } \sqrt {\sum\nolimits_{i} {({\text{SOH}}_{i} - {\text{SOH}}_{e} )^{2} } } }},$$
(10)
$${\text{GRC}} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\frac{{\min \left| {{\text{SOH}}_{i} - {\text{DV}}\_{\text{DT}}_{i} } \right| + \rho \mathop {\max }\limits_{\forall i} \left| {{\text{SOH}}_{i} - {\text{DV}}\_{\text{DT}}_{i} } \right|}}{{\left| {{\text{SOH}}_{i} - {\text{DV}}\_{\text{DT}}_{i} } \right| + \rho \mathop {\max }\limits_{\forall i} \left| {{\text{SOH}}_{i} - {\text{DV}}\_{\text{DT}}_{i} } \right|}}} .$$
(11)
Fig. 4
figure 4

Charging voltage and Pearson relevance between DV_DT and SOH for different voltage segments: a Cell1; b B0005; c B0029

Formula (11) refers to the gray relevance coefficient (GRC). DV_DTi is the time interval of the equal charging voltage rising of the ith cycle, SOHi is the state of health of the ith cycle, n is the total number of cycles, and DVT and SOHe are the mean values of {DV_DTi} and {SOHi} respectively. The Pearson coefficient and GRC between DV_DT and the SOH of the batteries are calculated for the voltage segment mentioned above, as shown in Table 2.

Table 2 Relevance between SOH and DV_DT of batteries

It can be seen from Table 2 that the Pearson coefficient of each battery is above 0.95, and that the GRC is above 0.8. This indicates that the health feature DV_DT has a strong correlation with battery aging, which is also easy to extract for online estimation.

3.2 Least squares support vector machine

Based on the VC dimension theory of the statistical learning theory and the principle of structural risk minimization, a support vector machine (SVM) is suitable for nonlinear small sample problems. LSSVM is an improvement of SVM [26]. The LSSVM inherits the advantages of SVM, while replacing the insensitive function of SVM with the two norms of error, and replacing the inequality constraint with the equality constraint, to reduce the computational complexity.

Let the training sample set be: \(\{ (x_{1} ,y_{1} )..(x_{n} ,y{}_{n})\}\), which can be fitted with the following high-dimensional linear mapping:

$$f(x) = w^{T} \phi (x) + b,$$
(12)

where w is the weight vector, b is the bias value, and φ is the nonlinear mapping function. According to the principle of structural risk minimization, the LSSVM regression problem can be transformed into the constrained optimization problem of (13):

$$\left\{ \begin{gathered} \min \frac{1}{2}w^{T} w + \gamma \sum\nolimits_{i = 1}^{n} {e_{i}^{2} } \hfill \\ {\text{s.t.}}{{y}}_{i} = w^{T} \varphi (x_{i} ) + b + e_{i} ,i = 1,2...n \hfill \\ \end{gathered} \right.,$$
(13)

where ei is the training error, γ is the regularization factor, and the Lagrange multiplier method is used to transform (13) into the dual problem of (14):

$$L = \frac{1}{2}w^{T} w + \gamma \sum\limits_{i = 1}^{n} {e_{i}^{2} } + \sum\limits_{i = 1}^{n} {\lambda_{i} (w^{T} \varphi (x_{i} ) + b + e_{i} - y_{i} ),}$$
(14)

where λi is the ith Lagrange multiplier, and the partial derivative of (14) is obtained by the KKT condition, listed as (15):

$$\left\{ \begin{gathered} \partial L/\partial w = 0 \hfill \\ \partial L/\partial b = 0 \hfill \\ \partial L/\partial e_{i} = 0 \hfill \\ \partial L/\partial \lambda_{i} = 0 \hfill \\ \end{gathered} \right. \Rightarrow \left\{ \begin{gathered} w = \sum\nolimits_{i = 1}^{n} {\lambda_{i} \varphi (x_{i} )} \hfill \\ \sum\nolimits_{i = 1}^{n} {\lambda_{i} = 0} \hfill \\ \lambda_{i} = \gamma e_{i} \hfill \\ y_{i} = w^{T} \varphi (x_{i} ) + b + e_{i} \hfill \\ \end{gathered} \right..$$
(15)

The model parameters λ and b of the LSSVM can be determined by solving linear Eqs. (16), which are equivalent to (15):

$$\left\{ \begin{gathered} \left[ {\begin{array}{*{20}c} 0 & {{\mathbf{1}}^{T} } \\ {\mathbf{1}} & {K + {\mathbf{I}}/\gamma } \\ \end{array} } \right]\left[ \begin{gathered} b \hfill \\ \lambda \hfill \\ \end{gathered} \right] = \left[ \begin{gathered} 0 \hfill \\ y \hfill \\ \end{gathered} \right] \hfill \\ \left\{ \begin{gathered} {\mathbf{1}} = (1,1...1)^{T} \hfill \\ \lambda = (\lambda_{1} ,\lambda_{2} ...\lambda_{n} )^{T} \hfill \\ y = (y_{1} ,y_{2} ...y_{n} )^{T} \hfill \\ K(x_{i} ,x_{j} ) = \varphi (x_{i} )^{T} \varphi (x_{j} ) \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered} \right.,$$
(16)

where K is the kernel function matrix. In addition, the radial basis function (RBF) (17) is selected in this paper:

$$K(x_{i} ,x_{j} ) = \exp \left( - \frac{{||x_{i} - x_{j} ||^{2} }}{{2\sigma^{2} }}\right).$$
(17)

Then the LSSVM model is as follows:

$$f(x) = \sum\limits_{i = 1}^{n} {(\lambda_{i} \cdot K(x,x_{i} )) + b} .$$
(18)

The unknown parameters of (18) are the regularization factor γ and the hyperparameter δ of RBF, which are usually determined by cross validation or algorithm optimization. In this paper, particle swarm optimization (PSO) [26] is used to optimize the parameters of LSSVM, whose fitness function is the root mean square value between the actual value and the output value of the LSSVM (19):

$$\min \sum\limits_{i = 1}^{m} {\left( {y_{i} - f\left( {x_{i} ,\gamma ,\delta } \right)} \right)^{2} } ,$$
(19)

where m is the number of training samples, and xi and yi are the input and output of the training set, respectively.

The implementation steps of the SOH estimation method based on a pure data-driven method with LSSVM are as follows.

LSSVM is established with the offline HF sequence {DV_DTi} as the input and the SOH sequence {SOHi} as the output. When using this method online, the DV_DT of new cycle is extracted, and input into LSSVM. Then, the SOH estimation of this cycle can be output. Generally, this method needs adequate training samples. In other words, adequate actual capacity values, which can only be obtained by regular calibration, which is time-consuming and laborious. If the capacity degradation details are not fully trained for lack of samples, the prediction results may diverge, which indicates poor robustness. It can be seen from Figs. 1 and 2 that the EDM does a good job of describing the overall trend of capacity degradation, which has good robustness. Therefore, the data-driven method can be combined with the EDM to enhance its robustness. Thus, the accuracy can be ensured while the number of training cycles can be reduced. In addition, the poor transplant ability of the EDM between different batteries due to battery inconsistencies can be improved, since the data-driven method can provide more information about the actual degradation.

4 LSSVM-ECM method

From the analysis in the previous sections, the capacity degradation of the battery can be divided into the overall trend and local differences. The EDM is used to describe the overall trend of battery degradation, which represents the historical information of battery aging. Meanwhile, the local difference of capacity degradation quantified by the fitting error of the EDM can be reflected by the external health features, which contain actual information of battery aging. Based on this idea, it is possible to achieve a combination between the EDM and a data-driven method. Figure 5 show a flow chart of the proposed LSSVM-ECM.

Fig. 5
figure 5

Flow chart of the LSSVM-ECM

The SOH estimation test can be divided into a single battery test and a different battery test. The former aims to estimate the SOH of the battery to be tested after the SPth cycle. When capacity cycle data before the SPth cycle is known, the SOH after the SPth cycle can be estimated through data before the SPth cycle. The latter, where the capacity degradation of the battery to be tested is unknown for the whole cycle, aims to estimate the SOH of the battery to be tested through the capacity degradation data of other batteries of the same type.

For the single battery test, as shown in Fig. 6a, the capacity data for the 1-SP cycle is used for the training test and the remaining cycles to be evaluated are used for the testing set, as shown by the red arrow in Fig. 5. The traveral method is used to determine the optimal voltage segment and to extract the health feature DV_DT with a training test. Refer to Sect. 3 for specific steps. Meanwhile, fit the capacity data before the SPth cycle with (9) to establish the empirical degradation model (EDM), which contains the parameter k1, k2 and k3. Calculate the fitting error between the actual SOH and the fitting value of the EDM before the SPth cycle, and (20) shows the training and testing set for a single battery test:

$$\left\{ \begin{gathered} {\text{EDM}}_{i} = k_{1} k_{2}^{i} + k_{3} i - k_{1} \hfill \\ P_{{{\text{train}}}} = \{ {\text{DV}}\_{\text{DT}}_{i} \}_{{i = 1...\text{SP}}} \hfill \\ T_{{{\text{train}}}} = \{ {\text{SOH}}_{i} - {\text{EDM}}_{i} \}_{{i = 1...\text{SP}}} \hfill \\ P_{{{\text{test}}}} = \{ {\text{DV}}\_{\text{DT}}_{i} \}_{{i = \text{SP} + 1...\text{EOL}}} \hfill \\ T_{{{\text{test}}}} = \{ {\text{SOH}}_{i}^{e} - {\text{EDM}}_{i} \}_{{i = \text{SP} + 1...\text{EOL}}} \hfill \\ \end{gathered} \right..$$
(20)
Fig. 6
figure 6

Diagrams of: a single test; b different battery test

In (20), Ptrain and Ttrain serve as the input and output to train LSSVM model, where the parameters γ and δ can be determined by PSO, as shown in [26]. Ptest and Ttest are the input and output of the testing set. SOHe refers to the estimated SOH, where the subscript i refers to the cycle number. When battery is in use under the ith cycle (i > SP), the health feature DV_DTi (i > SP) can be easily extracted, which is put into the established LSSVM model to output the estimated value of the fitting error between the actual SOH and the EDM output. EDMi (i > SP) is easy to calculate and summed with the estimated fitting error to acquire the estimated SOHi, which achieves error compensation for the SOH estimation after the SPth cycle, as can be seen in (21):

$$T_{{{\text{test}}}} + \{ {\text{EDM}}_{i} \}_{{i = {\text{SP}} + 1,...{\text{EOL}}}} = \{ {\text{SOH}}_{i}^{e} \}_{{i = {\text{SP}} + 1,...{\text{EOL}}}} .$$
(21)

For the different battery test, as shown in Fig. 6b, the capacity data for the battery to be tested is used for the testing set, and batteries of the same type as the battery to be test are used for the training set, as shown by the blue arrows in Fig. 5. First, select one battery of the same type as the battery to be tested as the basis battery. Then, fit its capacity data with (9) to identify the parameter k1, k2 and k3. Next, the established EDM is applied to the other training batteries to calculate the fitting error sequence Ttrain (22) shows the training and testing set for the different battery test:

$$\left\{ \begin{gathered} {\text{EDM}}_{i} = k_{1} k_{2}^{i} + k_{3} i - k_{1} \hfill \\ P_{{{\text{train}}}} = \{ {\text{DV}}\_{\text{DT}}_{i} \}_{{i = 1,2...{\text{EOL}}}}^{j} ,j = 1,2...N \hfill \\ T_{{{\text{train}}}} = \{ {\text{SOH}}_{i} - {\text{EDM}}_{i} \}_{{i = 1,2...{\text{EOL}}}}^{j} ,j = 1,2...N \hfill \\ P_{{{\text{test}}}} = \{ {\text{DV}}\_{\text{DT}}_{i} \}_{{i = 1,2...{\text{EOL}}}} \hfill \\ T_{{{\text{test}}}} = \{ {\text{SOH}}_{i}^{e} - {\text{EDM}}_{i} \}_{{i = 1...\text{EOL}}} \hfill \\ \end{gathered} \right.,$$
(22)

where N is the number of training batteries. Ptest contains the health feature sequences of the battery to be tested. Ptrain and Ttrain are served as the input and output to train the LSSVM model, where the parameters γ and δ can be determined by the PSO in [26]. When in an application, Ptest are put into the established LSSVM model to output the estimated value of the fitting error between the actual SOH and EDM output sequence of the battery to be tested. In addition, the estimated fitting error sequence is summed with the output value of the EDM of the battery to be tested to acquire the SOH estimation for the whole cycle of the battery to be tested. This achieves error compensation for the SOH estimation between different batteries. This process is expressed in (23):

$$T_{{{\text{test}}}} + \{ {\text{EDM}}_{i} \}_{{i = 1,2...{\text{EOL}}}} = \{ {\text{SOH}}_{i}^{e} \}_{{i = 1,2...{\text{EOL}}}} .$$
(23)

5 Experimental results and analysis

5.1 Single battery test

In the single battery test, the LSSVM-ECM is established based on the capacity data of the first SP cycles to predict the SOH value of the battery after the SP cycle. To verify the accuracy and robustness of the LSSVM-ECM for SOH estimation, the larger SP and the smaller SP are set up, and contrastive experiments with the LSSVM method are carried out. All of the standardized cycles of Cell1, Cell2, Cell3, Cell7, and Cell8 in the Oxford dataset are about 80, and the smaller SP is 30 and the larger SP is 50. In addition, for Cell4, Cell5, and Cell6 whose total standardized cycles are about 50, the smaller SP is 15 and the larger SP is 30. For B0005–B0007, which have 168 cycles, the smaller SP is 80 and the larger SP is 120, and for B0029–B0032 which have 40 cycles, the smaller SP is 15 and the larger SP is 25. Results of SOH estimations and the relative error percentages for each of the batteries are shown in Figs. 7 and 8. The black curves in Figs. 7 and 8 represent the real values of the SOH, the green dotted lines represent the prediction starting points (SP) of the SOH, the green curves are the EDM established with the capacity data of the first SP cycles, and the red and blue curves represent the SOH estimation values using the LSSVM-ECM and LSSVM methods, respectively.

Fig. 7
figure 7

Single battery test for a larger SP: a Cell1; b Cell2; c Cell3; d Cell4; e Cell5; f Cell6; g Cell7; h Cell8; i B0005; j B0006; k B0007; l B0029; m B0030; n B0031; o B0032

Fig. 8
figure 8

Single battery test for a smaller SP: a Cell1; b Cell2; c Cell3; d Cell4; e Cell5; f Cell6; g Cell7; h Cell8; i B0005; j B0006; k B0007; l B0029; m B0030; n B0031; o B0032

Figure 7 shows estimation results for the larger SP. It can be seen that the proposed method has a high SOH estimation accuracy. The relative error of the batteries (a)–(h) in the Oxford battery dataset is less than 1%, and for the NASA batteries (i)–(o), except for a few points, most of the points are less than 2%, which meets the requirements of BMS for estimation error. The values of the MAE and RMSE are given in Table 3. They are less than 1%, which indicates that the LSSVM-ECM proposed in this paper is suitable for various types of batteries. In addition, it can be seen from Fig. 7 and Table 3 that the LSSVM method can also achieve accurate SOH estimation when the larger SP is selected. Using this method, the relative errors of the Oxford batteries are less than 1%, and the NASA batteries, except for B0006 with 4%, are less than 2%.

Table 3 Estimation errors of a single battery test

Figure 8 shows estimation results for the smaller SP. It can be seen that the SOH estimation results with the LSSVM method show divergence at different degrees. For instance, the estimation errors of Cell1, Cell7, and B0005–B0007 reach more than 10%. Conversely, the LSSVM-ECM method still maintains a high accuracy. The estimation errors of the Oxford batteries are less than 1%. Except for B0006 and B0007 with 5%, the estimation errors of the batteries in the NASA dataset are less than 2%. It can be seen from Table 3 that the values of the MAE and RMSE of the LSSVM-ECM slightly increase or decrease on the basis of the same order of magnitude when the SP is larger or smaller. Meanwhile, that of the LSSVM increases dramatically. This is due to the fact that the latter, as a pure data-driven method, needs adequate training samples, which can fully learn and map the details of capacity degradation to establish the capacity degradation model. In addition, the insufficient samples tend to result in bad performance. On the other hand, the proposed method can maintain a high accuracy for both the larger SP and the smaller SP without significant changes. This is due to the fact that the EDM ensures the overall trend of capacity degradation and enhances the robustness of this method. In addition, capacity fluctuation can be well-reflected by feedback and compensation of the error, which represents the actual difference of battery degradation. In practical applications, this method can ensure the estimation accuracy, and reduce the training samples. In other words, it reduces the number of regular capacity calibrations, which saves time and labor.

5.2 Different battery test

When the capacity degradation for all of the cycles of the battery to be tested is unknown, the SOH can be estimated by the HF and degradation information of other batteries of the same type. The degradation trends of the batteries are different due to battery inconsistencies, which results in the poor transplant ability of the EDM. Therefore, the LSSVM-ECM method is used to dynamically compensate the fitting error of EDM to realize the SOH estimation of different batteries.

According to the total cycle times, the Oxford battery dataset is divided into two groups: Cell 1, Cell 2, Cell 3, Cell7, and Cell 8, whose total standardization cycles are around 80, and Cell 4–Cell 6, whose total cycles are about 50. For the NASA data sets, B0005–B0007 are in a group, and B0029–B0032 are in another group. The EDM model is established by taking one battery as the basis battery. The LSSVM-ECM is established by taking the offline HF data and the fitting error of the EDM of the M-1 training batteries to predict the capacity degradation of the testing battery. The setup of the training set and the test set are shown in Table 4. The blue curves in Fig. 9 show the SOH estimation results for different batteries by the EDM established with the basis battery. It shows the significant error, which indicates poor transplant ability. The red curve shows SOH estimation results after error compensation by the proposed method. It can be seen that for the Oxford data set (a)–(f), except for individual points, the relative error percentage of each battery is less than 2%, and less than 5% for the NASA data set (g)–(l), which is greatly improved on the basis of the EDM estimation results. Meanwhile, local fluctuations of the capacity can be reflected well, which matches the actual capacity degradation. It can be seen from Table 5 that the values of the MAE and RMSE of the proposed method in the Oxford dataset are less than 1%, except for Cell7 with 1.5%. It can also be seen that they are less than 2% in the NASA dataset, except for B0006 with 2.5%, which are far less than that of the EDM method. Test results show that the LSSVM-ECM method has strong nonlinear mapping performance, which can establish the regression relationship between the HF and the fitting error of the EDM for different batteries. This is done to effectively predict and compensate the fitting error, and improve the transplant ability of the EDM to realize accurate SOH estimation for different batteries.

Table 4 Different battery test setup
Fig. 9
figure 9

Different battery test: a Cell2; b Cell3; c Cell7; d Cell8; e Cell5; f Cell6; g B006; h B007; i B0030; j B0031; k B0032

Table 5 Estimation errors of a different battery test

6 Conclusion

In this paper, a SOH estimation method based on the LSSVM-ECM is proposed, which realizes the fusion and complementation of an empirical model and a data-driven method. The EDM is used to describe the overall trend of capacity degradation. The time interval of equal charging voltage rising is selected as a health feature to reflect the local difference in terms of capacity degradation. The LSSVM-ECM is established with the DV_DT as the input and the fitting error of the EDM as the output. This is done to dynamically compensate the prediction results of the EDM. Validations are conducted based on Oxford and NASA battery data obtained under different operating conditions. Test results demonstrate that the proposed SOH estimation method has high estimation performance, which can effectively reduce the number of training samples and improve robustness when compared to the traditional data-driven method. In addition, it can improve the transplant ability over the empirical model method.