1 Introduction

Real-time heart rate (HR) estimation from the photoplethysmography (PPG) signals is a key step in developing wearable devices that can monitor the HR in a non-invasive way.1 The PPG signal is obtained from pulse oximeters and its periodicity corresponds to the cardiac rhythm.2 Oximeters are designed with a light source and detector placed either in reflection mode or in transmission mode (Fig. 1a). Light of wavelength between 500 and 900 nm is illuminated on the skin. The light transmitted or reflected by the skin is proportional to the blood flow (Fig. 1b).3 Blood flow in the vessels is mediated by the heart beat (Fig. 1c). In spite of the HR information available in the PPG signal, reliable estimation of the HR is not straightforward due to the fact that the PPG signals are vulnerable to motion artifacts (MA), which strongly interfere with the HR. Figure 1d shows a 3-s long PPG signal acquired when the subject is at rest and Fig. 1e shows the same when the subject is exercising on treadmill. Figure 1f, g shows the respective tri-axial accelerometer data. In comparison, Electrocardiogram (ECG) signals shown in Fig. 1h, i are more robust to MA. MA in PPG during exercise is strong compared to the resting data. Depending on the type of physical activity of the user, the MA component can completely mask the HR information in the PPG signals causing the HR monitoring from the PPG signal challenging.4 , 5 Effective MA removal techniques are required for PPG signals acquired during physical exercise to accurately find the HR information. The steps involved in HR estimation using accelerometer information are shown in the block diagram (Fig. 1j).

Figure 1:
figure 1

a Acquisition of PPG signal,3 b illustration source and detector,3 c components of PPG signal 3, figures were reproduced with permission, d sample PPG signal for 3 s, e PPG signal with MA, f MA from tri-axial accelerometer during rest, g MA from tri-axial accelerometer during exercise, h ECG signal during rest, i ECG signal during motion, j block-diagram of steps involved in HR estimation.

A typical approach in estimating HR in the presence of MA is to first remove MA from the PPG signal. There are several techniques in the literature for removing MA from the PPG signal.6 Some of these techniques do not require motion information from an accelerometer while some others do. Independent component analysis (ICA) is one technique where motion data are not required; however, it requires multiple PPG sensors.7 ICA has been proposed in both time domain8 and frequency-domain.9 However, the assumption of statistical independence in ICA does not hold well in PPG signal contaminated by MA.10 On the other hand, when the acceleration data are available, the MA component is adaptively cancelled from the PPG signal using an adaptive filter following least mean square (LMS),11 normalized LMS,12 , 13 fast transversal recursive least square (RLS) algorithms,14 as well as spectrum subtraction technique,15 and Laguerre basis function-based signal representation.16 Other MA removal techniques include electronic processing methodology,17 time-frequency analysis,18 wavelet denoising,19 , 20 higher order statistics,9 empirical mode decomposition,21 , 22 and random forest-based method23 to name a few. However, in case of an intense physical exercise, most of these techniques do not work well.10 Acceleration data have also been used for the observation model for Kalman filter24 as well as for Kalman smoother25 to remove MA. The acceleration data reflect the hand movement in 3-D space while MA in PPG signal also originates from other sources such as changes of the gap between skin and a pulse oximeters surface which cannot be corrected by using acceleration data.10

In contrast to the adaptive filtering,26 Zhilin et al.10 have proposed a TROIKA (signal decomposiTion for denoising, sparse signal RecOnstructIon for high-resolution spectrum estimation, and spectral peaK trAcking) framework, where a high-resolution spectrum of the PPG signal is calculated using sparse signal reconstruction following signal decomposition using the singular spectrum analysis (SSA). Spectral peak tracking (SPT) is performed on the reconstructed PPG spectra to estimate the HR, which has been shown to be more accurate compared to other HR estimators. However, the SPT solely depends on the initialization of the HR in the first frame. When a wrongly detected spectral peak is used for initialization, the entire estimated HR trajectory could be different from the actual HR trajectory.

Different MA cancellation algorithms such as adaptive noise cancellation (ANC),27 Kalman filter (KF), Kalman smoother, as well as sparse reconstruction with SSA decomposition remove MA from the PPG signals to varying degrees. This depends on the nature of the MA cancellation algorithm, filter order, and other chosen parameters. The extent of MA cancellation further affects the SPT-based HR estimation performance.

Spectrum of the denoised signal was used to compute the HR.10 The initialization of the HR in the first few spectra of the PPG signal is done by selecting the frequency corresponding to the highest intensity, since the periodicity of the PPG signals in the first few seconds corresponds to HR. A window around the initialized HR is used to estimate the HR in future spectra. The window location is updated in each prediction. This leads to tracking of the HR in the successive spectra to lie within the given range of beats per minute (BPM).

In this review, pre-processing techniques are discussed. Pre-processing includes simple steps such as demeaning and bandpass filtering. This is done to remove baseline drift and mask the PPG and accelerometer signal within 20–210 BPM.28 Removal of MA from PPG using accelerometer signal is done in either time domain or frequency domain. Time domain algorithms include filtering techniques, whereas spectral subtraction is done in frequency domain. Different time domain and frequency domain algorithms developed for MA cancellation and advancements in SPT techniques are discussed. Datasets collected with subject on treadmill are detailed in the next section. The frequency and time domain algorithms developed on the dataset are detailed in Sect. 2. Discussions and conclusion is detailed in Sects. 4 and 5, respectively.

2 Datasets

The extent of MA component in PPG signal is not measurable. Hence, comparing the efficiency of PPG denoising algorithms remains a challenge. To estimate the efficiency of MA cancellation algorithms, MA is added to the PPG signal which is termed as synthetic dataset.29 Algorthims developed on synthetic data do not perform well on real-world data. Real-world datasets were acquired while subjects were on treadmill.10 Algorithms developed for clinical dataset are beyond the scope of this article.30

2.1 Synthetic

Target PPG signal is mixed with MA in frequency domain.29 PPG signal is obtained from a subject’s finger-tip in resting state. Two MA signals are extracted from PPG sources using temporally constrained independent component analysis and adaptive filtering.31 MA signals are added to PPG signals in a proportion. To find the efficiency of the synthesized signal, signal-to-noise ratio (SNR) is computed.

2.2 Real-World

Well-documented and open-source data were collected and used by Zhang et al.10 The collected dataset consists of PPG, tri-axial accelerometer, and ECG signals.10 All three signals were recorded simultaneously. Dataset 1 was collected from ten male subjects under the age of 18–33. Pulse oximeter of wavelength 609 nm was embedded into a wrist band which also incorporated accelerometer. Single-channel ECG was acquired from the chest. All the signals were sampled at 125 Hz. Signals were acquired for 8 min on a treadmill. The first minute subject was at a speed of 2 \(\text {km/h}\), next minute the speed was 4 \(\text {km/h}\), then at a speed of 10 \(\text {km/h}\) for a minute after which the speed was increased to 15–17 \(\text {km/h}\) for 1–1.5 min Fig. 2a. Then the whole cycle was reversed for next 3.5 min. Subjects were requested to perform intentional movements such as wipe sweat and button the shirt in addition to free swinging of the arm.

Figure 2:
figure 2

a Representation of dataset 1, b Representation of dataset 2, c TROIKA framework,10 block diagram of (d) CARMA35 and, e Empirical mode decomposition.36

Dataset 2 consists of two-channel PPG signals, both at wavelength 515 nm which was embedded into wrist band (which also has accelerometer) to collect the data from 12 subjects on treadmill. Each of the PPG data was acquired for 5 min. Subjects were at rest for first 30 s, then speed was 6 \(\text {km/h}\) for 1 min and 12 \(\text {km/h}\) for the next minute after which the speed was slowed to 6 \(\text {km/h}\) and then increased to 12 \(\text {km/h}\) for 1 min before resting for 30 s. For one of the subjects, the speed was 8 and 15 \(\text {km/h}\) Fig. 2b. ECG signals were also collected during exercise and annotated for heart rate in beats per minute (BPM) over every 8-s time-window which overlapped by 6 s. The advanced dataset was made publicly available as a part of 2015 IEEE signal processing cup (SPC) challenge. This review focusses on the HR estimation algorithms developed for SPC data.10 Footnote 1

Third set of data were recorded using the set-up by Zhang et al. from eight subjects.32 Seven subjects were healthy male, while the other female subject had abnormal heart rhythm and blood pressure. Female subject and 3 other subjects performed type 1 exercise which was forearm and upper arm exercise. Six healthy subjects performed type 2 intense exercise such as boxing air. These ten recordings were the test set to validate the algorithms by entrants of the challenge. The HR annotated from ECG is also available. Dataset was developed by Peng et al., with a different set of motions on treadmill. To estimate the efficiency of the MA removal techniques, PPG was acquired from right index finger during four different motions which are bending of finger at 90 \(^{\circ }\), waving of hand, pressurizing probe clip, and walking each for 20 s.29 These data were collected from 18 males and 13 females. These activities were performed after 5 min of rest, using two pulse oximeters on each hand at a frequency of 200 Hz. The left hand was stationary on treadmill during the acquisition at a speed of 5 \(\text {km/h}\).

3 Methods

During the rest state, the periodicity in PPG signal corresponds to the heart beat, but this periodicity is disturbed when the motion artifact interferes. HR estimation, which is a three stage process, begins with bandpass filtering to remove the noise beyond the range of heart beat. The bandpass filtered signal is denoised to remove the motion artifact. Denoising techniques, which are dependent on accelerometer signals, vary from simple filtering to complex sparse reconstruction. Heart rate tracking is the crucial step. HR tracking is done in the frequency domain.

3.1 Pre-Processing of PPG Signal

Photoplethysmography (PPG) signal is bandpass filtered from 0.5 to 8 Hz.29 For SPC dataset, PPG signals and accelerometer signals are bandpass filtered from 0.4 to 7 Hz. Usually, HR varies from 40 to 120 BPM, which corresponds to 0.6–3.5 Hz.10 Butterworth IIR filter was used.33 This bandpass filtering increases the sparseness in the signals which is an advantage for sparsity-based algorithms and reduces the wandering of HR tracks beyond range. Since PPG was prone to baseline drift due to electronic interference, demeaning and normalizations were also done as a part of pre-processing.34

3.2 Motion Artifact Removal and HR Estimation

3.2.1 Frequency Domain

TROIKA framework is robust to strong motion artifact.10 Signal decomposition, sparse signal reconstruction (SSR), and spectral peak tracking (SPT) formed the basis (Fig. 2c). For signal decomposition, the PPG signal is segmented into smaller, overlapping components. Singular spectrum analysis (SSA), single-channel independent component analysis, and empirical mode decomposition are some of the techniques used in the literature; SSA was used in TROIKA. This time domain decomposition involves embedding, singular value decomposition, grouping, and reconstruction. Focal underdetermined system solver algorithm which is used in source localization and direction-of-arrival estimation was used for SSR. SPT involves initialization, peak selection, and trend verification. The verification is dependent on the characteristics of the recorded data and the nature of heart rate. TROIKA is also performed on signals down-sampled at 25 HZ.

TROIKA framework which was considered gold standard was modified for better accuracy.33 SSA was followed by real-time clustering (RTC), frequency points selection, and prediction followed by multiple-way selection. In the spectrum of denoised signal obtained after SSA, three peaks which were two-thirds the intensity of the maximum were retained. The selected peaks were clustered. Due to the smoothness in the HR, when the cluster of peaks is not close to the previous peak, the previous peak is considered for BPM calculation. BPM for the current frame (beyond first three frames) were dependent on the BPM of two previous HRs. In multiple-way selection, the previous computed HRs were corrected. TROIKA was the first algorithm to use singular spectrum analysis for MA removal in PPG signals which was improvized further for better accuracy.

CARMA (closest subspace algorithm for reducing motion artifact) depends on singular value decomposition (SVD) of the Hankel matrix obtained after preprocessing the PPG and accelerometer signals (Fig. 2d).35 Key assumption of this technique was that the arifact and heart rate lie in different subspaces. Adaptive peak tracking was used for HR estimation. Spectrum subtraction is combined with ensemble empirical mode decomposition (EEMD). EEMD decomposes the PPG signal and acceleration signals into intrinsic mode functions (Fig.  2e).36 Spectrum subtraction is used to remove the spectrum of acceleration signal from PPG signal. In order to increase the resolution of the spectrum for better HR estimation, zero-padding Discrete Fourier Transformation (DFT) followed by KF for tracking of HR.37 This technique does not require pre-processing, and hence, it is computationally simple. SPECTRAP algorithm has spectrum subtraction, peak tracking, and post-processing. The down-sampled signals were filtered and subjected to symmetric least square-based spectral subtraction. MA spectrum is subtracted from PPG spectrum.38 Cumulative spectrum (CUMSPEC) technique sparsifies the signal using iterative method adaptive thresholding, followed by median filtering of HR estimates. Genetic algorithm was used for HR tracking.39 These algorithms were compared with TROIKA.

Gridless spectral estimation and SVM-based peak selection (GRESS) is another algorithm for Zhang’s advanced dataset.40 The conventional estimation of HR in a spectrum which is pixelated is modified. Grid-less spectral estimation is dependent on sparsity of the signal. Peak-to-peak separation distance and amplitude peaks were used to train a two-class support vector machine (Fig. 3a). Gridless spectral estimation helped to overcome the basis mismatch. Basis mismatch was observed in other algorithms which assumed sparsity aligned to specific frequency grids. Spectral peak selection was formulated into a pattern classification task, and hence user-defined parameters were reduced. This algorithm performed better than TROIKA framework for few subjects, but was not as efficient as joint sparse spectrum reconstruction (JOSS). JOSS was once again proposed by Zhilin Zhang. Multiple measurement vector model was used for joint spectral analysis which was followed by spectral subtraction.41 The peak verification process was intensified as compared to TROIKA.42 After the peak verification steps of TROIKA, the peak discovery is incorporated which smooths the selected peak (Fig. 3b). Instantaneous fundamental frequency of acceleration signal is computed using finite harmonic sum (HSUM) (Fig. 4a).3 The computed fundamental frequency is fitted to the PPG signal, where one of the harmonic sums corresponds to the HR while the harmonics are of MA. In Multi-channel spectral matrix decomposition (MC-SMD) technique, the spectra of PPG signal are divided by the spectrum of acceleration signal (Fig. 4b).43 This algorithm is compared with 25 Hz JOSS and TROIKA results. MC-SMD is computationally cheap because it does not have a noise-removal module. Short-time Fourier transform (SFST) and spectral analysis is also combined.44 SFST was robust to frequency variation compared to wavelet transform techniques. Adding a average filter increased the HR estimation.

Figure 3:
figure 3

a Block diagram of GRESS40 and b flow-chart of JOSS.41

Figure 4:
figure 4

Block diagram of a HSUM3 and b MC+SMD.43

3.2.2 Time Domain

Spectral peak search-comb filtering (SPS-CF) was developed which is dependent on the weighted sum of the delayed signal and bandpass time domain signal.29 The feed-forward comb filter was evenly spaced peaks in the magnitude frequency response. Adaptive filters were estimated for removal of MA from PPG signals.45 The authors used normalized least mean square (NLMS) filter to denoise each of the two PPG signals in advanced Zhang dataset. This leads to six denoised PPG signals. STFT of these signals were multiplied for a single spectrum. HR was estimated from the STFT spectrum by picking the highest intensity representing 40–170 BPM. To maintain the trend, the threshold was set for the chosen BPM. The efficiency of the adaptive filter technique is compared with TROIKA.

Multiple initialization spectral peak tracking (MISPT) has an denoising technique based on adaptive noise cancellation filter and emphasis is laid on peak selection (Fig. 5a).46 FFT of denoised time domain signal was taken for HR estimation. Initialization was done in each spectral window and the trajectory was generated by back tracking. The trajectory was based on selection of location of maximum intensity within a given window. The HR which corresponded to trajectory of maximum strength was the recorded BPM. This algorithm outperformed TROIKA and JOSS in both 25 and 125 Hz PPG signals.

Figure 5:
figure 5

a Flow-chart of MISPT,46 figure is reproduced with permission, b block diagram of WF+PV,48 c flow-chart of SpaMA.49

One of the three accelerometer signals are chosen based on their highest bandpower. The chosen accelerometer signal is used to filter the PPG signal using RLS filter.47 Bandpass filtered signals were normalized and averaged (Fig. 5b).48 These averaged PPG signals and accelerometer signals are down-sampled to 25 Hz. After which DFT is performed and Wiener filtering (WF) is applied to remove MA. To increase the HR estimation accuracy, phases of the chosen peak and the previous peaks are refined. Time-varying spectral comparison of PPG acceleration signal enables the removal of peaks due to MA (SpaMA) (Fig. 5c).49 RLS filter and Blackman-Harris-Window was used as pre-processing steps to estimate HR in HEAL-T (Fig. 6a).50 MURAD- MUltiple Reference ADaptive noise cancellation for HR estimation is the algorithm where the denoising of PPG using RLS filter is done individually by the three accelerometer signals and once again by the signal obtained by difference between the two PPG signals.51 RLS and spectral decomposition (SD) technique was combined for better performance.32 The key of this algorithm was the use of Bayesian decision for HR estimation. This algorithm is compared with 25 Hz TROIKA results.

Figure 6:
figure 6

Block diagram of a HEAL-T50 and b NLMS+AFT.55

Result of NLMS algorithm was further bandpass filtered with discrete oscillator-based adaptive notch filter (OSC-ANF) for instant tracking of changing frequency (Fig. 6b).52 HR estimation was done using recursive filtering after EEMD.53 SVD is used for MA cancellation (MAC) and HR was estimated from denoised signal using spectral analysis.54 Since MA is minimal for the first few seconds, HR estimation in the spectral domain is simple.41 The frequency corresponding to the maximum spectral intensity converted to beats per minute (BPM) is the heart rate. A window is taken around the initial heart rate while estimation of the HR in the subsequent windows. This is based on the fact that rate of change of HR is gradual. The window location is adaptive over each spectral window to enable gradual rise or fall in the HR. The frequency of optimized spectral peak within the given denoised spectral window is the HR.29 Estimated HR is also compensated for the directional trend of BPM.

Table 1: Comparison of the error in BPM across various algorithms detailed.

4 Discussion

The results of the existing algorithms are tabulated in Table  1. Performance of both the domains across 12 subjects (dataset 2) is listed. Twelve frequency domains and twelve time domain algorithms are shown. Least error by frequency domain denoising techniques is 0.73 BPM and time domain is 0.97 BPM. Consistently, the error in HR estimation by all the algorithms for subject ten is high. The only exception is HSUM technique.3 HSUM outperforms all the other algorithms, because it depends on the harmonics of the PPG signals. The harmonics are suppressed by all the other algorithms by bandpass filtering.

This review article is confined to the dataset collected by Zhang et al., because it is specific to the HR estimation when on treadmill. The other algorithms which are developed on datasets which are not available publicly are not discussed in this review.56,57,58, 59 The algorithms for which the performance across 12 subjects are not discussed is also excluded from the review since it confines the comparison across algorithms.60 , 61 PPG signals were also used to estimate other physiological parameters such as blood pressure and oxygen saturation.62 , 63 The details of the algorithms for computation of other physiological parameters are beyond the scope of this review. Heart-rate estimation from PPG acquired using other devices such as mobile phones and laptop cameras from other regions of interest for example forehead, ear is interesting and has its own unique challenge. However, it is beyond the scope of this review here.64 , 65

Estimation of HR from PPG is validated using HR estimated from ECG which is the electrical activity of heart. Since an error of 1 BPM is accepted clinically, three of the discussed algorithms have the potential to replace ECG for heart-rate estimation. PPG is widely used in clinic, and hence, it is safe to use. With advancement of wearable devices, acquisition of PPG signals is getting simpler than acquisition of ECG which has to be collected from a minimum of three electrodes.66 Another important point to be noted is, with good denoising of PPG signals, that other physiological parameters such as blood oxygen saturation, respiratory rate, and anaesthesia monitoring can also be measured unlike those from ECG.

5 Conclusion

Electrocardiogram (ECG) is the gold standard for HR estimation. The algorithms discussed here are evaluated with HR estimated from ECG. Hence, the evaluation method of these techniques is proven. The drawback of the current techniques is that it is not validated across different PPG datasets. Since these algorithms developed are independent of the previous estimates, increase of the data acquisition in time may not affect the accuracy. However, the discussed algorithms are not robust to subjects during other tasks or patients data. The performance is expected to vary considerably. The monitoring of the HR is more essential in subjects with cardiac disorders. The robustness of the proposed algorithms across patient database is yet to be validated. Even though the numerous algorithms developed are real-time, porting these to the embedded device still remains a challenge.