Introduction

As living standards improve, people have regarded personal health as a top priority. Heart rate (HR) as an important health indicator has drawn widespread interest [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]. HR is an important physiological parameter used to measure the ability of heart beating. As one of the most commonly used indices in the medical evaluation of patients, HR can be used to prevent and detect cardiovascular disease and other conditions [1]. Real-time monitoring of HR has been the main basis to quickly assess the health status of an individual [2]. Various smart wearable devices that can monitor HR have been developed. Among these gadgets, those worn on the wrist is widely used because of the convenience they provide. These smart devices can record a signal by photoplethysmography (PPG) from which we can calculate the HR [3]. Traditional HR measurement methods typically use electrocardiography (ECG) sensors, which are attached to the chest. This method often requires more complex equipment, which entails costs than usual, and may cause discomfort during HR measurement. Owing to the convenience and cost-efficiency of acquiring signals from a device worn on the wrist, many have adopted the use of PPG signals for HR monitoring. However, PPG signals can easily be influenced by MA, impeding the application of the aforementioned the technique [4,5,6]. A close estimate of the HR is difficult to derive from the PPG signals that are severely affected by MA. Therefore, the realization of a suitable, robust HR monitoring method has drawn wide attention.

Various approaches have been proposed to reduce the influence of MAs. Independent component analysis (ICA) [5] has been broadly applied in various fields. Using the independent component of two signals to separate noise in a signal, ICA mainly assumes the independence or uncorrelatedness of signals, which PPG signals fail to meet. Consequently, this method leads to unsatisfactory results. Adaptive filtering is another widely used technique [6,7,8,9,10,11]. It is highly efficient and can potentially achieve satisfactory outcomes if suitable reference signals exist but not otherwise, given that reference signals determine the quality of filtered signals [7]. A study recently verified signal decomposition [12] as an acceptable method. In another study, singular spectrum analysis was used to decompose a raw PPG signal into a number of components; the information in the acceleration signal was used to remove Mas, and the signal was synthesized into a clean PPG signal. Spectral subtraction [13] can also effectively remove MA. However, both signal decomposition and spectral subtraction exhibit high computational complexity. Another study [14, 15] recommended particle filtering for MA removal. The particle filter, one of the robust tracking methods using time-series data, is suitable for HR estimation because HR varies within a small range in a short period [14]. However, particle filtering can only track HR slowly, causing delays in some situations, such as sudden changes in HR due to changes in exercise. In [16, 17], a normalized least-mean-squares algorithm was first used to attenuate MAs, and an adaptive band-pass filter was used to track the common instantaneous frequency component (i.e., HR) of the reconstructed PPG waveforms [16]. The adaptive band-pass filter was derived from the discrete oscillator-based adaptive notch filter (OSC-ANF) proposed in [18]. In [19], a noise-robust OSC-ANF (NR-OSC-ANF) algorithm was recommended to improve the effect of frequency tracking by using the average of past estimated frequencies. Short-time Fourier transform [20] efficiently tracked HR but required a higher sampling rate for the signal, compared with other approaches. The accuracy of HR measurement at different anatomical locations was verified in [21], with the strongest HR signals found in the forehead and fingers.

A new technique was developed in the current study. Described as a superior method for MA removal with respect to speed [20], adaptive filtering was used. We adapted the method by combining two different adaptive filtering approaches. To select the appropriate peak value from the frequency domain as the HR, peak selection considering three factors was proposed. The proposed approach was compared with several state-of-the-art techniques and was proven quick and efficient.

The remainder of this paper is organized as follows. Section II describes the methodology and explains the proposed method. Section III discusses the experiment and the experimental results. Section IV presents the results. Section V concludes the paper.

Methodology

Datasets

The dataset including 12 male subjects running on a treadmill with changing speeds was provided by Zhang et al. in [12] for the Signal Processing Cup 2015. The two-channel PPG signals, three-axis acceleration signals, and one-channel ECG signals were simultaneously recorded from subjects with yellow skin, aged 18 y to 35 y. The PPG signals were recorded using two pulse oximeters with green LEDs; the wavelength of the green light was 515 nm. Acceleration signals were recorded with a three-axis accelerometer. These sensors were embedded in a wristband, facilitating the recording of these signals. Ground-truth HR can be calculated with the ECG signal that was recorded simultaneously from the chest by using wet ECG sensors. All signals were sampled at 125 Hz and transmitted to a nearby computer via Bluetooth.

For each subject, the data recorded within the 5 min interval consisted of data acquired within a 1 min rest period and 4 min operating period. Two patterns of operation were identified:

rest (30 s)—> 8 km/h (1 min)—> 15 km/h (1 min)—> 8 km/h (1 min)—> 15 km/h (1 min)—> rest (30 s).

rest (30 s)—> 6 km/h (1 min)—> 12 km/h (1 min)—> 6 km/h (1 min)—> 12 km/h (1 min)—> rest (30 s).

One person performed exercises in the first mode, while the others exercised in the second mode.

Proposed method

In this study, the proposed algorithm consists of two main steps: multichannel parallel adaptive filtering and peak selection.

Prior to multichannel parallel adaptive filtering and peak selection, the signal should be divided into individual segments for processing. An eight-second sliding window was used, with an overlap of 2 s between windows. Each sliding window was used to estimate HR.

The proposed HR estimation method is illustrated in Fig. 1. Three PPG channels consisting of signals from both PPG devices and a third channel consisting of the average of the two PPG signals. Each channel was processed individually by adaptive filtering, and each channel power spectrum was calculated using a periodogram. The optimal peak point was selected assigned as the HR in the spectrum by peak selection described in a later section.

Fig. 1
figure 1

Block diagram of the proposed heart rate estimation method

Multichannel parallel adaptive filtering

Raw PPG signals from three channels were filtered (Fig. 1). The accuracy of estimation was improved using three PPG signals because each signal was influenced by different degrees of interference, and the combination of these signals can reduce noise impact.

Figure 2 shows the frame of the adaptive filter for one channel of PPG signals. Two kinds of adaptive filters (the least mean squares [LMS] filter and the recursive least squares [RLS] filter) were used to filter the noise of PPG signals [25]. The structure of the filter could adopt a finite impulse response (FIR) or an infinite impulse response (IIR). The FIR filter was used as the structure of the adaptive filter to avoid the stability problems observed in the IIR filter. The FIR filter strictly exhibits linear phase-frequency characteristics while ensuring arbitrary amplitude–frequency characteristics, and its unit sampling response is finite, hence the stability of the system. Acceleration signals were used as the reference signals for the adaptive filter because the noise of PPG signals mainly comes from MAs, and 3D acceleration data were highly correlated with MAs. Notably, the reference signal uses a combination of 3D acceleration signals in this study. It is shown as follows:

Fig. 2
figure 2

Frame of adaptive filtering in the proposed method

$$a= \sqrt{{{a}_{x}}^{2}+{{a}_{y}}^{2}+{{a}_{z}}^{2}}$$
(1)

The results of the two filters were then combined using the following method. The Pearson correlation coefficient of the acceleration data and cleansed PPG signals, referred to as \({\rho }_{LMS}\) and \({\rho }_{RLS}\), was calculated. The output of the two filters is expressed as follows:

$$\mathrm{S}=\uplambda *{S}_{LMS}+\left(1-\uplambda \right)*{S}_{RLS}$$
$$\mathrm{s}.\mathrm{t}.\uplambda = \frac{{\rho }_{RLS}}{{\rho }_{LMS}+{\rho }_{RLS}}$$
(2)

where \(\uplambda\) is the combination parameter and calculated using \({\rho }_{LMS}\) and \({\rho }_{RLS}\), and it value usually ranges from 0.3 to 0.7. \({S}_{LMS}\) is the output of the LMS filter, and \({S}_{RLS}\) is the output of the RLS filter. S is the final output of the filter.

If the correlation coefficient \({\rho }_{LMS}\) is lower than \({\rho }_{RLS}\), the output of the LMS filter is better than that of the RLS filter. When combined, the output of the LMS filter comprises a larger proportion, and vice versa. With this method, the filtered signals of the two filters can be well combined.

The combined output of the two filters was required as the final output because the output of one filter could be encountered, and the HR calculated by the output might be wrong. Figures 3 and 4 illustrate the scenario. Blue circles represent the real HR, whereas red circles denote the estimated HR. As shown in Fig. 3, the wrong HR is derived from the output of the RLS filter, and in Fig. 4, the wrong HR is determined from the output of the LMS filter. The correct HR is obtained based on the output of the combined LMS and RLS filters, as shown in Figs. 3d and 4d. Thus, the combination of the two filters can efficiently enhance the accuracy of HR monitoring.

Fig. 3
figure 3

Effect of the two filters combined. a Raw photoplethysmography spectrum; b spectrum of the output obtained from the least-mean-squares (LMS) noise canceler. c spectrum of output obtained from RLS noise canceler. d spectrum of output obtained from the combination of LMS and RLS noise canceler

Fig. 4
figure 4

Effect of the combination of the two filters. a Raw PPG spectrum; b Spectrum of the output from the LMS noise canceler; c Spectrum of the output from the RLS noise canceler; d Spectrum of the output from the combination of LMS and RLS noise canceler

Peak selection

By using the aforementioned approach, the filtering results of the three signals were obtained. The appropriate HR was be selected from the spectrum of these three signals by using the following method. Figure 5 presents the flowchart of the peak selection process.

Fig. 5
figure 5

Flowchart of the peak selection

First, we use FFT for frequency domain analysis with an order \({\mathrm{N}}_{FFT}\). \({\mathrm{N}}_{FFT}\) is a parameter that determines the frequency resolution. The HR is generally limited to a certain range. Consequently, the filtered signal only needs to consider a small area in the frequency domain, and a suitable peak point could be selected as the HR point in this area. The approximate range of HR is 0.4–3 Hz; thus, the corresponding approximate area in the frequency domain is \(0.4*\frac{{N}_{FFT}}{{F}_{s}}\sim 3*\frac{{N}_{FFT}}{{F}_{s}}\), where \({F}_{s}\) is the sampling frequency of the PPG signal. The real area obtained using the proposed method was adaptive—that is, the center of the area was determined from the estimated HR value in the aforementioned time window. The size of the area, θ, was calculated based on the rate of change in HR, ε, in the previous window. If the rate of change in HR is high, the person is considered to be in a state of “strenuous exercise,” and a larger area is set; otherwise, the person is regarded as being in a “quiet” state, and a smaller area is set. In this study, \({\upvarepsilon }_{1},{\upvarepsilon }_{2},{\upvarepsilon }_{3}\) were set to narrow the range of the rate of change in HR. The size of the area was calculated as follows:

$$\theta = ~\left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\theta _{1} ~~~~~~~\varepsilon < \varepsilon _{1} ~~~~~~} \\ {\theta _{2} ~~~\varepsilon _{1} < \varepsilon < \varepsilon _{2} } \\ \end{array} } \\ {\theta _{3} ~~~\varepsilon _{2} < \varepsilon < \varepsilon _{3} } \\ {\theta _{{4~~~~~~~~~~~}} \varepsilon < \varepsilon _{3} ~~~~~} \\ \end{array} } \right.$$
(3)

After the area was determined, peak selection was performed. The candidate point was chosen from the previously determined area. To avoid possible errors, we selected the four highest peak points for each signal rather than only the highest peak point. If no more than four peak points existed, 0 was used instead.

Finally, 12 candidate points were acquired from the spectrum of three PPG signals. The point representing HR were selected from the 12 candidate points. In this study, a new method was proposed for choosing a suitable point. The quality of the candidate points was assessed based on three factors:

1) Spectral amplitude of the candidate points. The larger the amplitude, the more likely the candidate points represent HR.

2) Distance between the candidate points and the spectral peak of the acceleration signal. The two highest peaks are selected from the spectral area of the previously determined acceleration signal, and the shortest distance between these points and candidate points is calculated. The greater the distance, the higher the probability that this candidate point represents the HR.

3) Distance between the candidate points and the point determined in the last time window. The change in HR is not a sudden change, The smaller the distance, the more likely the candidate points were to represent HR.

The score was used to measure the three factors, each of which had a corresponding score. The score was calculated as follows:

$${Sc}_{high}=h/{h}_{max}$$
(4)
$$Sc_{{dis1}} = \left\{ {\begin{array}{*{20}c} {0.25} & {dis1 < q/2} \\ {0.5} & {dis1{ \leqq }qanddis1{ \geqq }q/2} \\ {0.5 + 0.5*\frac{{dis1 - q}}{{dis1_{{max}} - q}}} & {dis1 > q} \\ \end{array} } \right.$$
(5)
$$Sc_{{dis2}} = \left\{ {\begin{array}{*{20}c} 1 & {dis2 < p} \\ 0 & {dis2 > 3*p} \\ {0.8 + (dis2 - p)/(2.5*p)} & {\begin{array}{*{20}c} {} & {dis2{ \leqq }3p\,and} \\ \end{array} dis2{ \geqq }p} \\ \end{array} } \right.$$
(6)

where p, q are empirical values based on the experiment, which are related to \({\mathrm{N}}_{FFT}\) and usually less than 0.5 Hz in frequency. \(dis1\) is the distance between the candidate points and the spectral peak of the acceleration signal, and \(dis2\) is the distance between the candidate points and the HR point determined in the last time window. \({dis1}_{max},{dis2}_{max}\) are the maximum of the four peak points for each signal. h is the spectral amplitude of the candidate points, and \({h}_{max}\) is the maximum of the spectral amplitude. \({Sc}_{high}, {Sc}_{dis1},{Sc}_{dis2}\) are the scores of the three factors. In this study, four scores for each signal were identified about the spectral amplitude because of the four candidate points. If only one score about the spectral amplitude exceeded 0.55, the scores about the spectral amplitude were reset as follows:

$$\left\{ \begin{gathered} Sc_{{high}} = ~Sc_{{high}} + 0.25 \hfill \\ Sc_{{high\max }} = ~Sc_{{high\max }} - 0.5\quad dis1 < q \hfill \\ Sc_{{high\max }} = ~Sc_{{high\max }} + 0.45\quad dis1 > q \hfill \\ \end{gathered} \right.$$
(7)

where \({Sc}_{highmax}\) is the maximum score about the spectral amplitude. The aforementioned process is important because having only one score about the spectral amplitude greater than 0.55 indicates that the corresponding point is special. If \(dis1<q\), the point with the largest amplitude could be influenced by the acceleration signal; thus, we reduced its score while increasing those of the other three points. If \(dis1>q\), this point would highly likely be the point we needed; we thus increased its score.

These scores were combined in a certain proportion, and the candidate points were measured using the combined scores, as follows:

$$\mathrm{Sc}= 5*{Sc}_{high}+2*{Sc}_{dis1}+4*{Sc}_{dis2}$$
(8)

Ultimately, there were 12 scores about the 12 candidate points. The point with the highest score (N) was chosen to represent the HR point. The HR was thus calculated in terms of BPM:

$$\mathrm{BPM}= \frac{60\times {F}_{s}\times N}{{N}_{FFT}}$$
(9)

where \({F}_{s}\) is the sampling frequency in Hz, \({N}_{FFT}\) is the number of points used for computing for the periodogram, and \(N\) is the location of the selected point in the frequency domain.

The HR of the previous window largely affected the HR selection in the current window. When the HR in the previous window was incorrect, that in the current window could also be incorrect, potentially affecting the subsequent HR estimation. To avoid this situation, we combined the scores by using a new ratio, as follows:

$${Sc}_{ca}= 6*{Sc}_{high}+4*{Sc}_{dis1}+1*{Sc}_{dis2}$$
(10)

The point with the highest score (\({N}_{ca}\)) was also selected to represent the HR point. If \({N}_{ca}\ne N\) in the continuous k time window, then the estimation of HR was wrong, and \({N}_{ca}\), instead of \(N\), was used.

Notably, in the determination of maximum scores, the maximum scores of the three signals could be equivalent. The indices of the three scores were thus calculated. If two indices were close to each other while the other index was far from the two indices, one of the two indices was selected as the final output; otherwise, the index of the first signal was used as the output. Specific settings were as follows:

$$\mathrm{N}=\begin{cases}{N}_{1}&{|N}_{1}-{N}_{2}|<{\varphi }_{1}, {|N}_{1}-{N}_{3}|>{\varphi }_{2} and {|N}_{2}-{N}_{3}|>{\varphi }_{2}\\{N}_{1}&{|N}_{1}-{N}_{3}|<{\varphi }_{1}, {|N}_{1}-{N}_{2}|>{\varphi }_{2} and {|N}_{2}-{N}_{3}|>{\varphi }_{2}\\{N}_{2}&{|N}_{2}-{N}_{3}|<{\varphi }_{1}, {|N}_{1}-{N}_{3}|>{\varphi }_{2} and {|N}_{2}-{N}_{1}|>{\varphi }_{2}\\{N}_{1}&otherwise\end{cases}$$
(11)

where \({N}_{1},{N}_{2},{N}_{3}\) represent the indices of the three scores, and \(N\) is the final index.

The HR in each time window was determined in accordance with the given steps. In general, the change in HR did not exceed 10 BPM for every two consecutive time windows. The final HR should thus be modified. If \(\left|BPM-{BPM}_{prev}\right|>10\), the HR should be set as

$$\mathrm{BPM}= {BPM}_{prev}\pm 10$$
(12)

Finally, to obtain a smooth HR, the final estimate was calculated using a conventional three-point moving average operation considering 90% weight to the obtained initial estimate and 5% to each of the estimates in the two previous windows [25].

Results

Parameter settings

The filter was designed with the order of the LMS filter set to 20, and that of the RLS filter set to 5. The step size μ chosen was 0.002. In the peak selection, the frequency resolution of the FFT, referred to as N_FFT, was fixed to 30,000. The parameters about the area in the frequency domain were \({\uptheta }_{1}=200,{\uptheta }_{2}=250,{\uptheta }_{3}=300,{\uptheta }_{4}=350,{\varepsilon }_{1}=0.2{,\varepsilon }_{2}=0.4, {\varepsilon }_{3}=0.6\). The parameters \(\mathrm{k},\mathrm{p},\mathrm{q},{\varphi }_{1}{,\varphi }_{2}\) were set to 3, 40, 52, 6, 15, respectively.

Performance measurement

To evaluate the performance of the proposed method, four performance indices were considered: the average absolute error (AAE), average absolute error percentage (AAEP), Pearson correlation, and Bland–Altman plot [26].

AAE is expressed as

$$error1=\frac{1}{M} \sum_{i=1}^{M}|{BPM}_{est}(i)- {BPM}_{true}(i)|$$
(13)

and AAEP is given by

$$error2= \frac{1}{M}\sum_{i=1}^{M}\frac{|{BPM}_{est}(i)- {BPM}_{true}(i)|}{{BPM}_{true}(i)}$$
(14)

where \({BPM}_{est}\left(i\right), {BPM}_{true}(i)\) represent the estimated HR and ground truth in the ith time window, and M is the total number of time windows.

The Pearson correlation coefficient is a measure of the degree of similarity between ground truth data and estimates. A high Pearson correlation coefficient indicates a good HR estimate.

The Bland–Altman plot is used to verify agreement between the ground-truth HR and the estimated HR values. The limit of agreement (LOA) is also calculated, expressed as \([\upmu - 1.96\upsigma ,\upmu + 1.96\upsigma ]\), where μ is the average difference, and σ is the standard deviation.

Result

The method proposed in this study was compared with various other methods, such as the TROIKA [12], JOSS [13], SPECTRAP [23], COMB [24], Particle Filter [15], OSC-ANFc [17], and NR-OSC-ANF [19] methods. Tables 1 and 2 list the AAE and AAEP of the 12 subjects. In the proposed method, the 12 subjects had an average AAE of 1.16  ±  1.56 BPM, and the average of error2 was 0.87%. These results markedly superior to those obtained using other recently introduced methods.

Table 1 Comparison of average absolute errors in 12 subjects among different approaches
Table 2 Comparison of average absolute erros percentages in 12 subjects (in %) among different approaches

Figure 6 presents the correlation between the ground truth and the estimates. The scatterplot in the figure shows the linear relation between the ground truth and the estimates. The Pearson correlation coefficient was 0.9947, indicating a satisfactory result.

Fig. 6
figure 6

Pearson correlation between the ground truth and the estimates on the 12 datasets

Figure 7 presents the Bland–Altman plot of the 12 subjects. The LOA was [−5.17, 4.67] BPM, with 94% of the data within 1.96 σ. This result indicates that regardless of the state, the estimated HR could also be equal to the ground truth.

Fig. 7
figure 7

Bland–Altman plot showing agreement between the ground truth and the estimated heart rates on the 12 datasets

The result pertaining to one of the subjects is shown in Fig. 8 for a visual representation of the performance. The figure shows that the method proposed in this study is efficient and that the ground truth and estimates are consistent.

Fig. 8
figure 8

Heart rate tracking by the proposed algorithm for Subject 8

Finally, the speed of the proposed method was evaluated using Matlab2014a on a Core i5 6500 2.5 GHz processor equipped with 8 GB RAM for Windows 10. Each time window had an average calculation time of 240 ms. This result shows that the proposed method can satisfy real-time HR monitoring.

Discussion

On the basis of the result, the proposed method exhibits superior performance. Under previously reported method, the AAEs were 2.34 [12], 1.28 [13], 1.16 [14], 1.56 [15], 1.88 [16], 1.40 [17], 1.50 [23], 1.82 [24], and 1.16 BPM [19]; under the proposed method, the AAE was 1.16 BPM, and the standard deviation was 1.56 BPM. Under previously reported methods, the AAEPs were 1.79% [12], 1.12% [23], 1.08% [17], and 0.94% [19]; under the proposed method, the AAEP was 0.87%, a satisfactory outcome. Meanwhile, the Pearson correlation between the ground truth and the estimates obtained using the proposed method was 0.9947, which is higher than the correlation coefficient (0.993) obtained in [13].

Owing to adaptive filtering, the proposed method computes faster, compared with previous methods that adopted signal decomposition [12, 13, 23]. The performance of the proposed method is superior to those of other methods [12,13,14,15,16,17, 23, 24]. PARHELIA [14] is also an efficient approach to measuring HR; however, owing to its particle filtering feature with slow HR tracking, PARHELIA fails to perform efficiently when HR suddenly changes as a result of exercise. Compared with that of NR-OSC-ANF [19], the frequency tracking algorithm of the proposed method entirely considers various factors affecting HR measurement, which may be more stable in complex scenarios.

For peak selection, we implemented an innovative approach with three factors considered. Numerous parameters that were set manually were included. We intend to adapt these parameters to varying situations in future research. To verify the efficiency of the proposed algorithm, testing of more cases will also be considered.

Conclusion

In this study, a robust method for HR monitoring during intensive physical exercise is proposed. Using wrist-type PPG signals, the proposed technique combines three signals for filtering, efficiently avoiding the interference of strong MAs. The peak selection process is a major innovation in this study. For different scenarios, different parameters can be adjusted, and the proportion of each factor can be altered to achieve improved results. The proposed method is fairly robust, capable of tracking the ground truth with high estimation accuracy regardless of abrupt changes in HR with respect to time.