1 Introduction

Use of wearable sensors such as wrist-bands and smartwatches for monitoring vitals stats of the patients and/or healthy individual is a common trend in recent years (Dubey et al. 2015a, b, c, 2016b; Monteiro et al. 2016). Wearable body sensors are trending for telemonitoring of personalized health parameters such as heart rate (HR), activity, sleep quality, steps, and calories burned. Wearable sensors have been used in education sector such as analysis of peer-led team learning groups (Dubey et al. 2016a, c). Even though, the design of personalized wearable devices is becoming more elegant and user-friendly, their performance is questionable with respect to data reliability. For example, Spierer et al. (2015) reported that commercial wearable sensors under performed significantly in monitoring HR during daily life activities such as walking, biking, and stair climbing. They used commercial, wearable photoplethysmography (WPPG) sensors that are nowadays found in smartwatches and wristbands (Mendelson et al. 2006). The WPPG is a non-invasive technology to capture the cardiac rhythm and hence could be used for continuous HR monitoring. The WPPG technology is an alternative to electrocardiogram (ECG) for continuous real-time HR monitoring (Mendelson et al. 2006; Challoner 1979). As reported in (Spierer et al. 2015), WPPG is significantly affected by the body movements during various activities of daily life. In general, the motion artifacts corrupt the signal components of heart rate or PPG signal. Therefore, HR estimation from wearable PPG signal is a challenging problem faced by the researchers in academia and industry.

In this article, we demonstrate our approach of using harmonic sum (HSUM) models in reducing the impact of motion artifacts and in extracting HR accurately from a WPPG signal. We outline the basic principle of a WPPG and related works in Sect. 2. The motivation and proposed method are described in Sect. 3, followed by a discussion on evaluation results in Sect. 4. Our results demonstrate that the HSUM models are suitable for estimating heart rate (HR) from PPG signals that are severely affected with motion artifacts (see Fig. 1).

Fig. 1
figure 1

Typical scenario for application of commercial wearable PPG sensors

2 Related works

2.1 Wearable PPG system

A wearable PPG system can be either of transmittance-type or reflectance-type. It consists of a light source and a detector packed with supporting hardware into a wristband or earring. The transmittance-type detects the light transmitted through the tissues by a photodiode kept opposite to the light source. The reflectance-type detects the intensity of reflected light using a photodiode kept on the same side as the light source. It works on the change in light intensity upon reflection from the tissue or blood vessels (Tamura et al. 2014). Using PPG sensors on fingertips facilitates a good quality PPG signal with transmittance-type PPG, though it interferes with various activities of daily life. The reflectance-type wristband PPG is often preferred, as it provides least hindrance during different activities and can be incorporated in a smartwatch. Red, infrared or green light-emitting diodes are common light sources in WPPG systems. Figure 2 shows the principle of reflectance-type PPG. The combination of a light source and a light detector is kept together close to the skin surface. The light-emitting diode (source) illuminates the skin that transmits the light. Subsequent tissues partly absorb and partly transmit the light passed through the skin surface. Finally, transmitted light is reflected by the subcutaneous tissue. The photodiode (sink) is activated by reflected light generating a voltage signal. The voltage signal produced by the photodiode is acquired and filtered by a hardware circuit. The voltage thus acquired is the PPG signal. It is quantitatively related to changes in blood volume in the microvascular tissues. Since it is related to the cardiac rhythm, it can be used for estimation of heart rate (HR). Modern smartwatches are equipped with a PPG sensor and various other geometric sensors such as an accelerometer, a gyroscope, and a magnetometer. An accelerometer sensor is commonly found in such smart wristband devices to record the acceleration signal that helps monitor motion artifacts. The DC component of the frequency transformed PPG signal corresponds to light absorption from skin, tissues, bones and vascular elements (non-pulsating arterial blood and venous blood). However, the AC component results from pulsating arterial blood flow that is related to cardiac rhythm (systole and diastole). The heart rate can be extracted from the AC component of PPG signal as it is related to the cardiac cycle (Tamura et al. 2014) (see Fig. 3).

Fig. 2
figure 2

(adopted from Webster (1997))

a Diagram showing the principle of reflectance-type PPG. b Relation between changes in light intensity and the cardiac cycle

Fig. 3
figure 3

The proposed framework for heart rate (HR) estimation from the PPG signal affected with strong motion-artifacts. The algorithm take the PPG signal and the accelerometer signal as input and outputs the estimated HRs for each 8-s window with 6-s overlap between successive windows

2.2 Motion-artifacts and distortions in a PPG signal

Figure1 shows the typical scenario for application of commercial wearable PPG sensors. A PPG signal is corrupted by the influence of external factors such as ambient light, ambient temperature and pressure in addition to movements caused by the day to day physical activities. Motion-artifacts (MAs) and pressure disturbances are the most significant distortions in a PPG system that lead to inaccurate measurement of the physiological parameters  (Dresher, 2006). Pressure disturbances arise from the contact between PPG sensor and skin/body area where PPG sensor is deployed (Dresher, 2006). The arterial geometry of measurement site is changed due to pressure applied to the skin by the sensor cabinet. The pressure applied to the skin due to the placement of PPG sensor lead to undesirable changes in AC component of the reflected PPG signal (Dresher, 2006). Accurate estimation of heart rate from PPG signal corrupted by motion-artifacts has been a challenging task using time-domain as well as frequency-domain algorithms (Zhang et al. 2015b). The impact of various environmental distortions on a PPG signal quality is well studied in (Maeda et al. 2011) for infrared, red and green LEDs. The HR estimates obtained from green LED was found to be more accurate than those found with infrared and red light. The location of PPG sensor also impacts the detection quality due to the fact that the body’s sweat rate and temperature vary at various locations. The contact between the skin and the sensor, movement in wearer’s body part with the sensor, breathing, and physical activity degrade the quality of acquired PPG signal (Constant et al. 2015).

2.3 Motion-artifact reduction in wearable PPG signal

Many techniques have been proposed for reducing the motion-artifacts in a PPG signal. Use of time-domain and frequency-domain independent component analysis (ICA) has been suggested, but it has two disadvantages. Firstly, the assumption of statistical independence between the PPG signal, and the motion-artifact does not hold in all cases. ICA required two acquisitions of motion corrupted PPG signals that provides additional burden on wearable PPG devices. Consequently, it requires multiple PPG sensors that might not be suitable for small wearable devices (Zhang et al. 2015b; Kim and Yoo 2006). Various adaptive signal processing techniques have been developed that use a reference signal to reduce the motion-artifacts. These techniques are not useful for everyday activities due to difficulty in estimating the appropriate reference signal for such cases (Zhang et al. 2015b).

Typically, the wristband systems such as smartwatches and fitness bands have an accelerometer that simultaneously records the acceleration signal (Inc. 2016). Other techniques being used for the motion-artifact and distortion reduction in PPG signal include the spectrum subtraction (subtracts the spectrum of the acceleration signal from the spectrum of the PPG signal) (Fukushima et al. 2012), adaptive filtering (Ram et al. 2012), higher-order statistics (Krishnan et al. 2008), wavelet transforms (Foo 2006), empirical mode decomposition (Zhang et al. 2015a), time-frequency method described in (Yan et al. 2005), and the Kalman filtering (Lee et al. 2010). These methods are experimentally found to be effective only for small movements such as slow walking (Zhang et al. 2015b). Authors in Zhang et al. (2015b) suggested a generic and flexible framework, TROIKA, that is signal decomposiTion for denoising, sparse signal RecOnstructIon for high-resolution spectrum estimation, and spectral peaK trAcking with verification. TROIKA was validated for HR estimation using wrist-type PPG signals while wearer runs at various speeds on a treadmill. Author proposed JOSS, that is, JOint Sparse Spectrum reconstruction for accurate HR estimation using wrist-type PPG signal (Zhang 2015). It jointly estimated the spectra of the PPG signal and the acceleration signals. It is based on the multiple measurement vector (MMV) model for sparse signal recovery. A common sparsity constraint on the spectral coefficients helps in identification and removal of the spectral peaks corresponding to the motion-artifacts in the PPG spectrum. JOSS uses MMV model for sparse reconstruction, unlike TROIKA based on the single measurement vector (SMV) model (Zhang et al. 2015b). The JOSS (Zhang 2015) exploits the common structures present in the spectrum of PPG signal and the spectrum of the acceleration signal and had shown better performance than the TROIKA algorithm (Zhang et al. 2015b). A method for HR estimation based on Wiener Filtering and the Phase Vocoder (WFPV) was proposed in (Temko 2015). Authors evaluated WFPV and concluded that it performed better than JOSS on average. The WFPV algorithm uses the accelerometer signal to estimate the motion-artifacts and later use a Wiener filer to attenuate the components of motion-artifacts in the corrupted PPG signal. The phase vocoder improved the resolution of estimation of dominant frequencies. Authors developed a method consisting of four stages namely, wavelet-based denoising, acceleration-based denoising, frequency-based heart rate estimation and finally a post-processing stage. This method was found to be robust to motion-artifacts that occur during sports and rehabilitation (Mullan et al. 2015). An algorithm based on time-varying spectral filtering (named SpaMA) was proposed for accurate estimation of heart rate from PPG signals corrupted with motion-artifacts. Authors tested this approach over various datasets that were collected during various activities of daily life using wrist-band type PPG system (Salehizadeh et al. 2015).

3 Materials and methods

In this section, we will describe the dataset, and discuss the motivation for development of harmonic sum (HSUM) models for HR estimation. Later, we will describe the mathematical derivations of the harmonic sum (HSUM) models based algorithm for HR estimation. We proposed a harmonic sum (HSUM) model for the measured acceleration signal and a joint HSUM model for the PPG signal corrupted with motion-artifacts. First, we perform an exploratory analysis of the signals that motivated the development of proposed algorithm. We evaluated the performance of HSUM algorithm on IEEE SP cup dataset. Later, we did a comparative analysis of HSUM with four methods that were recently developed namely TROIKA (Zhang et al. 2015b), JOSS (Zhang 2015), WFPV (Temko 2015), and SpaMA (Salehizadeh et al. 2015).

3.1 Datasets

The scenarios used for acquisition of the IEEE SP cup data is described in  (Zhang et al. 2015b). The dataset consists of 12 motion affected PPG signals obtained from individuals while running on treadmill. It had dual-channel PPG signal along with simultaneously acquired ECG signal and three-axis acceleration signals. We found that for the proposed method using just one of the PPG channels was sufficient for heart rate extraction. We used the second channel for results discussed in this paper. The data was collected using a wrist-type PPG sensor while the wearer ran on a treadmill with increasing and decreasing speed for 5 min. The PPG signal, the accelerometer signal, and the ECG signal were simultaneously recorded from 12 male subjects in the age range of 15–18 years. The wristband had a pulse oximeter with a green LED of wavelength 515 nm along with embedded accelerometers for acquisition of the PPG and the accelerometer signal. Wet ECG sensors were used to simultaneously collect the ECG data from the chest. The PPG, ECG and the accelerometer signals were sampled at 125 Hz. The acquired signals were sent to a nearby computer using Bluetooth. The data were collected while the subjects walked or ran on a treadmill starting from rest to high speed before coming to rest again. Starting at a speed of 1–2 km/h (kmph) for 30 s, the speed was increased to 6–8 kmph for 1 min followed by doubling the rate to 12–15 kmph for another 1 min. For next 2 min, the same cycle is repeated, i.e., starting at speed of 6–8 kmph followed by 12–15 kmph. Finally, the subject walks at a speed of 1–2 kmph for 30 s before coming to rest.

The ground-truth heart rate manually computed using the ECG signal were shipped with the dataset. The ground-truth HR for each overlapping time-window was computed by counting the number of cardiac cycles (H) and the duration (D) in seconds (Zhang et al. 2015b). The heart rate in beats per minute (BPM) is given by

$$\begin{aligned} HR = \frac{60H}{D} \end{aligned}$$
(1)

We did not use any algorithm for the estimation of heart rate (HR) from the ECG signal as it may cause estimation errors. We just used the provided ground-truth. The average absolute error \(\xi _{HR}\) in HR estimates over N time-windows is defined as

$$\begin{aligned} \xi _{HR} = \frac{1}{N} \sum _{i=1}^N |HR[i]- H\hat{R}[i]| \end{aligned}$$
(2)

where HR[i] and \(H\hat{R}[i]\) were the ground-truth and estimated HR value for the ith time-window, respectively.

3.2 Motivations

The signal acquired using a wrist-band worn by a person running on treadmill or similar intense physical exercise is severely corrupted with motion-artifacts. Estimating the heart rate from such a PPG signal is challenging due to two facts. Firstly, the motion-artifacts are stronger than the heart-beat component in the PPG signal at several instances. Secondly, the spectrum of the heart-beat signal is close to the frequency range of the motion-artifact complicating the matter further.

Figure 4 shows an example of a PPG signal corrupted by the motion-artifacts and a simultaneously measured accelerometer signal. The quasi-periodicity in the accelerometer signal, shown in the bottom panel of the Fig. 4, is quite evident. It contained the information about the motion-artifacts. Figure 5 shows the Short-time Fourier Transform (STFT) of the accelerometer signal. The STFT (also known as a spectrogram) was obtained by 2048-point FFTs computed over 8-s time-windows with 6-s overlap between successive windows. The STFT shows a strong fundamental frequency component around 1 Hz along with several higher harmonics of moderate intensity.

Fig. 4
figure 4

An example of the PPG signal corrupted by the motion artifacts (top panel) and an accelerometer signal collected simultaneously (bottom panel) are shown for an 80-s time-window. The sampling rate is 125 Hz. The quasi-harmonic structure of the accelerometer signal is depicted for this figure. The PPG signal also has a quasi-harmonic structure but an envelope modulation is also observed. Such a modulation is caused by the interaction between the true heart rhythm signal and the signal components induced by the physical movements. Hence, the PPG signal may be modeled as a sum of two harmonic series with slightly different fundamental frequencies over short time-windows. A portion of the DATA05TYPE02 dataset of IEEE SP cup was used for generating this figure (Zhang et al. 2015b)

Fig. 5
figure 5

The STFT of the accelerometer signal with a 2048-point fast Fourier transform (FFT) applied on 8-s time-windows where the successive windows have a 6-s overlap. The spectral amplitudes are quite significant upto about 12 Hz. Individual harmonics with fundamental frequency in the range of 1–3 Hz are evident. The complete acceleration signal of DATA05TYPE02 dataset was used for generating this figure (Zhang et al. 2015b)

Figure 4 (top panel) shows the PPG signal. Clearly the PPG signal shows significant envelope fluctuations. A cursory examination of the waveform shows that the envelope fluctuations have a frequency of roughly 0.2–0.4 Hz. This is due to the fact that the harmonic components of the heart rhythm interacting with the harmonic components of the motion related signal, i.e., dominant components of the two periodic signals are quite close to each other in frequency thereby producing a ‘beat signal’ envelope. The Fig. 6 shows the STFT of the PPG signal computed using the same parameters as in Fig. 5. The PPG signal has less number of significant harmonics when compared to the accelerometer signal. Further, in the STFT, we notice that in the low frequency region there is significant interaction between the two quasi-periodic signals. Hence unlike the STFT for the accelerometer signal, the frequency tracks are somewhat jumbled. Thus, it is not possible to resolve the individual harmonic components of the two periodic signals using standard Fourier transform unless the time window is made wider or sampling rate is increased. However, the time window can not be made much wider because then the heart rhythm signal and the motion-artifact related signal might change their rates within that wider window thereby smearing the frequency tracks. This is a classical problem in time-frequency analysis methods like the STFT. To increase resolution of the STFT, the sampling rate could also be increased.

Fig. 6
figure 6

The STFT of a PPG signal with 2048-point FFTs. The window sizes and overlap are the same as in Fig. 5. The spectrum is dominant till about 6 Hz. Because of the presence of two sets of harmonics in the PPG signal the frequency tracks in this STFT are not as clean as in Fig. 5. It shows that the accelerometer signal and the heart rhythm signal have some overlapping spectral regions. The complete PPG signal in DATA05TYPE02 dataset of IEEE SP cup was used for generating this figure (Zhang et al. 2015b)

Figure 7 shows the problems with estimating the heart rate from the locations of the peaks in the Short-Time Fourier Transform (STFT) of the PPG signal. As an exploratory step, we computed the heart rates from the STFT magnitude of the motion-artifact corrupted PPG signal. The PPG signal was divided into overlapping windows of 8-s duration with 6-s overlap between successive windows. Each of the (Hanning) windowed segment of PPG signal was processed with a 2048-point fast Fourier transform (FFT). The frequency location of the largest peak in the magnitude of the STFT was used to obtain the heart rate estimate for each window. This frequency location in Hz is multiplied by 60 to get the heart rate in beats per minute (BPM). This heart rate obtained for the PPG signal is plotted for each time window in Fig. 7 (solid black line). The ground-truth heart rate obtained from the simultaneously acquired ECG signal is shown by the blue line. As can be seen, the heart rate estimates obtained from the PPG signal’s STFT-magnitude peaks’ locations wildly fluctuate and also significantly deviate from the ground-truth. Also shown for comparison purposes, are the frequency locations of the magnitude peak of the STFT (converted to BPM) of the accelerometer signal (red line). Notice that in many time windows, the estimates given by the PPG signal (black line) and the BPM values corresponding to the peak-magnitude locations of the accelerometer signal (red line) coincide. This shows that often the motion-artifacts are much stronger than the heart-beat component in the PPG signal. In other words, the motion-artifacts often dominate the PPG signal, and so they need to be some how suppressed to the extent possible from the measured PPG signal before heart rate estimation. To overcome some of the above mentioned problems, we propose a novel approach. Since both the heart rhythm signal and the motion-artifact related signal appear to be quasi periodic in nature, we propose to use a truncated Fourier series to model these signals over short time windows. Such models have been previously used, for example, in processing voiced speech sounds (vowels) (Kumaresan et al. 1992). We propose the following strategy. First model the accelerometer signal using a truncated Fourier series (HSUM), and estimate its fundamental frequency. Then, since the PPG signal is a composite of the heart rate signal and the motion-artifact related signal, we fit a sum of two different truncated Fourier series models (joint HSUM) to the PPG signal. One of the harmonic sums corresponds to the heart-beat component in PPG and the other models the motion-artifact. However, the fundamental frequency of the motion-artifact has already been determined from the accelerometer signal in the first step. Using this estimate, in the next step we estimate the fundamental frequency of the other periodic component that obtains the heart rate (see Fig. 8).

Fig. 7
figure 7

The purpose of this figure is to point out that picking the largest peak of the STFT magnitude of the measured PPG signal in each time window gives incorrect estimates of the instantaneous heart rate. The STFT of a signal was obtained with 2048-point FFTs computed over 8-s time-windows where the successive windows had 6-s overlap. The frequency location of the peak of the magnitudes of the STFT in each time window were obtained and multiplied by 60 to give an estimate of the heart rate in beats per minute. The heart rate in beats per minute for the PPG signal is denoted by the black solid line. The ground-truth heart rate (HR) obtained from the simultaneously acquired ECG signal is shown by the blue line. As can be seen, the heart rate estimates obtained from the PPG STFT peaks significantly deviate from the ground-truth. Also shown for comparison purposes, are the peak locations obtained from the accelerometer signal’s STFT (red line). Notice that in many time windows, the estimates given by the PPG signal and the peak-magnitude locations of the accelerometer signal’s FFT coincide. This shows that often the spectrum of motion-artifacts overlap with that of the heart-beat component of the PPG signal (Color figure online)

Fig. 8
figure 8

Top panel Comparison of heart rate estimates obtained using the HSUM model and the ground-truth heart rate. Time-windows of 8-s duration with 6-s overlap between the successive windows were used. The ‘HR raw’ are the HR obtained directly from the peak locations of the spectrogram of the measured PPG signal (as in Fig. 7). Harmonic sum-based estimates are almost the same as the ground-truth heart rate estimates except at a couple of points. The mean absolute error is 0.6970 beats per minute (BPM). The data used was DATA05TYPE02 from Zhang et al. (2015b). The ‘HSUM median’ line corresponds to a 3-point median filtered estimates of harmonic sum-based method. It slightly improves the estimates obtained from harmonic sum (HSUM) modeling. Bottom panel Shows the relative mean error energy (obtained by dividing the mean squared error(SE) by the energy of the acceleration signal) over successive time-windows. Notice that in general, larger relative mean error energy corresponds to greater deviation of the estimated heart rate from ground-truth

3.3 Harmonic sum (HSUM) for the accelerometer signal

Let us first consider the simpler problem of modeling the accelerometer signal, since it is assumed to consist of only one quasi-periodic signal. Let us assume that we have \(N_{a}\) samples of the accelerometer signal. It is modeled as a sum of a DC component \(a_{0}\) and \(M_{a}\) sines and cosines with frequencies that are integer multiples of the fundamental frequency \(f_a\) Hz. The amplitudes are denoted by \(a_k\) and \(b_k\). This HSUM model is denoted by \({\hat{{\mathbf {x}}}_{{\mathbf {a}}}}\).

$$\begin{aligned} \hat{x}_{a}[n]= a_{0} + \sum _{k=1}^{M_{a}}a_{k}\cos \left( {\frac{2 \pi knf_{a}}{f_s}}\right) + \sum _{k=1}^{M_{a}} b_{k}\sin \left( {\frac{2 \pi knf_{a}}{f_s}}\right) \end{aligned}$$
(3)

The unknown amplitudes and fundamental frequency \(f_a\) in the above equation are estimated by minimizing the squared error (SE) between original signal \({\mathbf {x}}_{\mathbf {a}}\) and the model \({\hat{{\mathbf {x}}}_{{\mathbf {a}}}}\),

$$\begin{aligned} SE = \sum _{n=0}^{N_{a}-1} \left( \hat{x}_{a}[n] - x_{a}[n] \right) ^2. \end{aligned}$$
(4)

We can vectorize Eqs. 3 and 4 by writing

$$\begin{aligned} {\mathbf {x}}_{\mathbf {a}}= \left( x_{a}[0], x_{a}[1],\ldots ,x_{a}[N_{a}-1] \right) ^T, \end{aligned}$$
(5)
$$\begin{aligned} {{\hat{\mathbf {x}}}}_{\mathbf {a}} = \left( \hat{x}_{a}[0], \hat{x}_{a}[1],\ldots ,\hat{x}_{a}[N_{a}-1] \right) ^T, \end{aligned}$$
(6)
$$\begin{aligned} {{\mathbf {a}}_{{\mathbf {a}}}}= \left( a_{0},a_{1},\ldots ,a_{M_{a}}, b_{1},\ldots ,b_{M_{a}} \right) ^T, \end{aligned}$$
(7)

and a matrix \({\mathbf {W}}_{\mathbf {a}}\) defined as follows.

$$\begin{aligned} W_{a}[k,l]= {\left\{ \begin{array}{ll} 1 &{}{\mathrm{k}=0; \mathrm{l}=1,\ldots ,2M_{a}}\\ \text {cos} \left( \frac{2\pi klf_{a}}{f_s} \right) , &{}{\mathrm{k}=1,\ldots ,N_{a}-1; l=1,\ldots ,M_{a}}\\ \text {sin}\left( \frac{2\pi k(l-M_{a})f_{a}}{f_s} \right) , &{}{\mathrm{k}=1,\dots ,N_{a}-1;l=M_{a}+1,\dots ,2M_{a}}\\ \end{array}\right. } \end{aligned}$$
(8)

where \(W_{a}[k,l]\) stands for the \((k,l){th}\) element of the \({\mathbf {W}}_{\mathbf {a}}\) matrix. The Eq. 3 can be rewritten in vectorized form as

$$\begin{aligned} {\mathbf {x}}_{\mathbf {a}}= {\mathbf {W}}_{\mathbf {a}} {\mathbf {a}}_{\mathbf {a}}. \end{aligned}$$
(9)

The Eq. 4 can then be rewritten in vector form as

$$\begin{aligned} SE= || {\mathbf {x}}_{\mathbf {a}} - {\mathbf {W}}_{\mathbf {a}} {\mathbf {a}}_{\mathbf {a}} ||_2^{2}. \end{aligned}$$
(10)

Minimizing SE by choosing the unknown parameters is a bi-linear least squares problem, since both \({\mathbf {W}}_{\mathbf {a}}\) matrix and \({\mathbf {a}}_{\mathbf {a}}\) are unknown. But it can be simplified as follows. The squared error in Eq. 10 can be minimized by a standard least squares method (see for example, Kumaresan et al. (1992)) if the frequency \(f_a\) (and hence \({\mathbf {W}}_{\mathbf {a}}\)) is known. For a given frequency \(f_{a}\), we can form the matrix \({\mathbf {W}}_{\mathbf {a}}\) as per Eq. 8. Then the amplitude vector \({\mathbf {a}}_{\mathbf {a}}\) that minimizes the squared error is given by Kumaresan et al. (1992)

$$\begin{aligned} {\mathbf {a}}_{\mathbf {a}} = \left( {\mathbf {W}}_{\mathbf {a}}^T{\mathbf {W}}_{\mathbf {a}} \right) ^{-1} {\mathbf {W}}_{\mathbf {a}}^T {\mathbf {x}}_{\mathbf {a}}. \end{aligned}$$
(11)

Substituting the above expression for \({\mathbf {a}}_{\mathbf {a}}\) back in Eq. 10, we can rewrite the squared error (SE) as

$$\begin{aligned} SE= || {\mathbf {x}}_{\mathbf {a}} - {\mathbf {W}}_{\mathbf {a}} \left( {\mathbf {W}}_{\mathbf {a}}^T{\mathbf {W}}_{\mathbf {a}} \right) ^{-1} {\mathbf {W}}_{\mathbf {a}}^T {\mathbf {x}}_{\mathbf {a}} ||_2^{2}. \end{aligned}$$
(12)

We shall define a projection matrix \(\mathbf {P_{a}}\), as follows

$$\begin{aligned} \mathbf {P_{a}}= {\mathbf {W}}_{\mathbf {a}} \left( {\mathbf {W}}_{\mathbf {a}}^T {\mathbf {W}}_{\mathbf {a}}\right) ^{-1} {\mathbf {W}}_{\mathbf {a}}^T. \end{aligned}$$
(13)

By noting that the matrices, \(\mathbf {P_{a}}\) and \(\mathbf {I - P_{a}}\) are idempotent, we can rewrite the squared error in Eq. 12 as follows (Kumaresan et al. 1992).

$$\begin{aligned} SE = {\mathbf {x}}_{\mathbf {a}}^{T} \left( \mathbf {I} -\mathbf {P_{a}}\right) {\mathbf {x}}_{\mathbf {a}}. \end{aligned}$$
(14)

Note that the squared error, SE, explicitly depends only on the unknown frequency \(f_{a}\). We can either minimize the expression in Eq. 14, or equivalently, maximize \({\mathbf {x}}_{\mathbf {a}}^{T} \mathbf {P_{a}}{\mathbf {x}}_{\mathbf {a}}\) by picking the best \(f_a\). Minimization of SE can be achieved by searching over a grid covering the range of expected frequency values of \(f_{a}\). Since we know that the accelerometer signal has a fundamental frequency in the range of 1 to 3 Hz, we can use a grid search over this range of frequencies with a step size of, say, 0.01 Hz. By minimizing the SE (Eq. 14) we estimate the fundamental frequency of the HSUM model for each overlapping window. In our processing algorithms we used overlapping windows of 8-s duration with 6-s overlap between successive windows. Equation 11 can then be used to compute the amplitudes of all harmonics for each window. The optimum frequency estimate computed here is used for estimating the parameters of a joint HSUM model for the PPG signal as described in the next section.

3.4 Joint harmonic sum (HSUM) model for the PPG signal

The PPG signal acquired during daily activities is composed of the heart-beat signal and dominant motion-artifacts induced by the physical movements of the user. Unfortunately, we do not know how exactly the physical movements of the subject affect the PPG signal. Although in the previous subsection we obtained a model fit to the accelerometer data and estimated the fundamental frequency \(f_a\) and the amplitudes \({\mathbf {a}}_{\mathbf {a}}\) we are uncertain as to how the individual harmonic’s amplitudes affect the PPG data. But we may hypothesize that the artifacts induced by physical movements in the PPG signal have the same fundamental frequency as that of the accelerometer signal. If this were true, then we can use the fundamental frequency estimate \(f_a\) obtained in the previous subsection (but not the amplitudes \({\mathbf {a}}_{\mathbf {a}}\)) to help mitigate the effects of the motion-artifacts on the PPG signal. Our experimental results below seem to validate this hypothesis. The harmonic sum (HSUM) model for the PPG signal consists of a sum of two truncated harmonic series with different fundamental frequencies, \(f_{a}\) for the motion-artifact component, and \(f_{h}\) for the heart-beat component. The value for \(f_{a}\) that gives minimum squared error (SE) for accelerometer signal fit is taken as the optimum fundamental frequency of the motion-artifact (and renamed as \(f_{oa}\) for ease of use). The signal model for the PPG signal is then given by the Equation,

$$\begin{aligned}\hat{x}_{p}[n]&= a'_{0} + \sum _{k=1}^{M_{a}} a'_{k} \cos \left( {\frac{2 \pi knf_{oa}}{f_s}}\right) +\sum _{k=1}^{M_{a}} b'_{k} \sin \left( {\frac{2 \pi knf_{oa}}{f_s}}\right) \nonumber \\&\quad + \sum _{\kappa =1}^{M_{h}} c_{\kappa } \cos \left( {\frac{2 \pi \kappa nf_{h}}{f_s}}\right) + \sum _{\kappa =1}^{M_{h}}d_{\kappa } \sin \left( {\frac{2 \pi \kappa nf_{h}}{f_s}}\right) . \end{aligned}$$
(15)

The model represented by the Eq. 15 can be vectorized as in the previous subsection as follows.

$$\begin{aligned} {\hat{\mathbf {x}}_{p}}= {\mathbf {W}}_{\mathbf {oa}} {\bar{\mathbf {a}}_{a}} + {\mathbf {W}}_{\mathbf {h}} {\mathbf {c}}_{\mathbf {h}} \end{aligned}$$
(16)

where the weight matrix, \({\mathbf {W}}_{\mathbf {h}}\), and the amplitude vector of the heart-beat component of the PPG signal, \({\mathbf {c}}_{\mathbf {h}}\) are defined as follows.

$$\begin{aligned} {\mathbf {c}}_{\mathbf {h}} = \left( c_{1},\ldots ,c_{M_{h}}, d_{1},\ldots ,d_{M_{h}} \right) ^T \end{aligned}$$
(17)

and

$$\begin{aligned} W_{h}[k,l]= {\left\{ \begin{array}{ll} \text {cos} \left( \frac{2\pi klf_{h}}{f_s} \right) , &{}{\mathrm{k}=1,\ldots ,N_{h}-1; l=1,\ldots ,M_{h}}\\ \text {sin}\left( \frac{2\pi k(l-M_{h})f_{h}}{f_s} \right) , &{}{\mathrm{k}=1,\ldots ,N_{h}-1;l=M_{h}+1,\ldots ,2M_{h}}\\ \end{array}\right. } \end{aligned}$$
(18)

respectively. Here, \(W_{h}[k,l]\) stands for the \((k,l){th}\) element of the \({\mathbf {W}}_{\mathbf {h}}\) matrix. The known fundamental frequency \(f_{oa}\) is used to synthesize the optimum weight matrix, \({\mathbf {W}}_{\mathbf {oa}}\), for the acceleration signal using Eq. 8 (i.e., substituting \(f_{oa}\) in place of \(f_a\)). The amplitudes of motion-artifact component, \({\bar{\mathbf {a}}_{a}}\) are specified in vector form as follows.

$$\begin{aligned} {\bar{\mathbf {a}}_{a}} = \left( a'_{0},a'_{1},\ldots ,a'_{M_{a}}, b'_{1},\ldots ,b'_{M_{a}} \right) ^T, \end{aligned}$$
(19)

The combined amplitude vector of both motion-artifact component and heart-beat component of the PPG signal is given by

$$\begin{aligned} {\mathbf {a}}_{\mathbf {c}} = [{\bar{\mathbf {a}}_{a}}, {\mathbf {c}}_{\mathbf {h}}]^{T}. \end{aligned}$$
(20)

Similarly, by concatenating the weight matrices corresponding to heart-beat component, \({\mathbf {W}}_{\mathbf {h}}\) and the motion-artifact related signal \({\mathbf {W}}_{\mathbf {oa}}\), we get the combined weight matrix, \({\mathbf {W}}_{\mathbf {c}}\), as

$$\begin{aligned} {\mathbf {W}}_{\mathbf {c}} = \left( {\mathbf {W}}_{\mathbf {oa}} | {\mathbf {W}}_{\mathbf {h}}\right) . \end{aligned}$$
(21)

Now, analogous to Eq. 10, we can write the squared error for the PPG signal as follows.

$$\begin{aligned} SE_p= || {\mathbf {x}}_{\mathbf {p}} - {\mathbf {W}}_{\mathbf {c}} {\mathbf {a}}_{\mathbf {c}} ||_2^{2}. \end{aligned}$$
(22)

where \({\mathbf {x}}_{\mathbf {p}}\) stands for the observed PPG signal vector. Following the same steps as in the case of accelerometer modeling, the combined amplitude vector that minimizes \(SE_p\) for a given \(f_h\) is given by

$$\begin{aligned} {\mathbf {a}}_{\mathbf {c}} = \left( {\mathbf {W}}_{\mathbf {c}}^{\mathbf {T}} {\mathbf {W}}_{\mathbf {c}} \right) ^{-1} {\mathbf {W}}_{\mathbf {c}}^T {\mathbf {x}}_{\mathbf {p}}. \end{aligned}$$
(23)

The projection matrix for joint HSUM model for PPG signal is given by

$$\begin{aligned} {\mathbf {P}}_{\mathbf {p}}= {\mathbf {W}}_{\mathbf {c}} \left( {\mathbf {W}}_{\mathbf {c}}^T {\mathbf {W}}_{\mathbf {c}}\right) ^{-1} {\mathbf {W}}_{\mathbf {c}}^T. \end{aligned}$$
(24)

The corresponding squared error, \(SE_{p}\), in vector form is written as

$$\begin{aligned} SE_{p} = {\mathbf {x}}_{\mathbf {p}}^{T} \left( \mathbf {I} -{\mathbf {P}}_{\mathbf {p}}\right) {\mathbf {x}}_{\mathbf {p}}. \end{aligned}$$
(25)

The frequency, \(f_{oh}\), that gives the minimum \(SE_p\) is the heart rate in Hz. We multiply it by 60 to get the heart rate in beats per minute (BPM) as indicated in Fig. 3. For finding the best \(f_h\) (called \(f_{oh}\)), we used a grid search in the range of 0.5 to 3 Hz (in steps of 0.01 Hz). For each frequency in the grid, we compute corresponding weight matrix, \({\mathbf {W}}_{\mathbf {h}}\) with a different order \(M_{h}\) . \({\mathbf {W}}_{\mathbf {oa}}\) is of course fixed. Once \(f_{oh}\) is determined, then all the optimal amplitudes can be estimated using the expression in Eq. 23. Using the joint HSUM model, we can now suppress the motion-artifacts present in the PPG signal leaving behind the heart-beat component of the PPG signal. We can reconstruct the heart-beat related signal component \(\hat{x}_{hb}[n]\) using the expression,

$$\begin{aligned} \hat{x}_{hb}[n]= \sum _{\kappa =1}^{M_{h}} c_{o\kappa } \cos \left( {\frac{2 \pi \kappa nf_{oh}}{f_s}}\right) +\sum _{\kappa =1}^{M_{h}}d_{o\kappa } \sin \left( {\frac{2 \pi \kappa nf_{oh}}{f_s}}\right) , \end{aligned}$$
(26)

where \(c_{o\kappa }\) and \(d_{o\kappa }\) are the optimum amplitude estimates obtained from Eq. 23. Similarly, we can reconstruct the motion-artifact component of the PPG signal using the following expression,

$$\begin{aligned} \hat{x}_{a}[n]= \sum _{k=1}^{M_{a}} a'_{ok} \cos \left( {\frac{2 \pi k nf_{oa}}{f_s}}\right) + \sum _{k=1}^{M_{a}}b'_{ok} \sin \left( {\frac{2 \pi k nf_{oa}}{f_s}}\right) , \end{aligned}$$
(27)

where \(c_{ok}\) and \(d_{ok}\) are the optimum amplitude estimates obtained from Eq. 23. Figure 9 shows the application of HSUM to a 8-s time-window of the PPG signal and the accelerometer signal. The Fig. 9a shows the time-domain accelerometer signal for 8-s duration (1000 samples at sampling rate of 125 Hz). The quasi-periodic structure in the time-domain acceleration signal is evident from this figure. The Fig. 9b shows the PPG signal and its joint HSUM model fit for the same time-window. We can see that the joint HSUM model closely follows the PPG signal. Finally, Fig. 9c, d shows the heart beat component of the PPG signal computed using Eq. 26, and the motion-artifact component of the PPG signal obtained using Eq. 27. We can see that the amplitude of the heart-beat component of the PPG signal is significantly lower than amplitude of the motion-artifact component. The beauty of the harmonic sum (HSUM) model lies in the fact that it fit the frequency of both components, the heart-beat and motion-artifact, using the joint HSUM model. We can summarize the proposed algorithm as follows. We first process the accelerometer data over a short time window using the HSUM model in Eq. 3. We minimize the squared error in Eq. 4 by finding the optimum fundamental frequency \(f_{oa}\) valid over that time window. Then we process the PPG data over the same time window using the joint HSUM model in Eq. 15. We minimize the squared error in Eq. 25 by finding the optimum fundamental frequency \(f_{oh}\) while making use of the \(f_{oa}\) value obtained in the first step. The \(f_{oh}\) is finally converted to a heart rate estimate valid over that time window. We then repeat this process for each overlapping time-window. We now describe the results of applying this algorithm on IEEE SP cup data.

Fig. 9
figure 9

a An example of the accelerometer signal for a 8-s time-window, b a sample PPG signal and its joint HSUM model fit for the same window, c heart-beat component of the PPG signal shown in figure b obtained by using Eq. 26, d motion-artifact component of the PPG signal using Eq. 27. This segment is taken from DATA05TYPE02 dataset (Zhang et al. 2015b) in a time interval when the individual is running at the rate of 12 kmph. It can be seen from high magnitude of acceleration in this segment. Since the HSUM model is fitted over a 8-s window over which the acceleration signal and PPG signal are quasi-periodic, similar figures would be obtained for 6 and 15 kmph for example. Higher speed of motion cause higher corruption of PPG signals with motion artifacts

4 Results and discussions

This section describes the experiments conducted to validate the efficacy of the proposed HSUM model for estimation of HR from the PPG signal corrupted with motion-artifacts.

4.1 Results

Figure 8 compares the heart rate estimates obtained using the HSUM model, the HSUM followed by 3-point median filtering, frequency locations of the peak of the STFT magnitude of the PPG signal with the ground-truth. The time-windows of 8-s duration with 6-s overlap between successive windows were used. It shows that the heart rate estimates obtained using the HSUM model are almost the same as the ground-truth heart rate(HR) estimates except at a few points. The mean absolute error was 0.6970 beats per minute (BPM) for the dataset, DATA05TYPE02 (from Zhang et al. (2015b)). We used a 17-th order HSUM model (\(M_a=17\)) for accelerometer signal and 7-th order HSUM model (\(M_h=7\)) for the heart-beat component of the PPG signal. We used a 3-point median filter to refine the HR estimates obtained from the HSUM model. The 3-point median filter incorporates HR estimates from previous and next windows. Clearly, maximum refinement of HR estimates is seen at the point of highest deviation from the ground-truth Eq. (8). Average absolute error in HR estimates with HSUM model was 0.9852 BPM with standard deviation of 1.1670 BPM for the complete dataset. On using 3-point median filter, we got an average absolute error of 0.7359 BPM with standard deviation of 0.8328 BPM. On an online scheme where we the future frame (next 2 s) of PPG and acceleration signals are not available, the median filtering can be skipped. It is to be noted that HSUM is effective in accurate modeling of PPG and acceleration signals over short overlapping windows. HSUM gives accurate heart rates without median filtering. However, authors include median filtering for cases where we can have access to next 2 s of signals (or equivalently we are in a offline scheme where short delays are acceptable). With 2 s delay, we can refine the HR estimates by incorporating the context (that is done by median filtering). On the other hand, average absolute error in HR estimates obtained from STFT is 27.5152 BPM with standard deviation of 27.2596 BPM. Using median filter on HR estimates from short-time spectrum we get average absolute error of 26.0886 BPM with standard deviation of 24.7005 BPM. The error in HR estimates obtained from the STFT is very large because motion-artifacts have corrupted the PPG signal significantly due to considerable motion of the subject’s hand while running on the treadmill. The Fig. 7 shows the HR estimates obtained from the STFT magnitude of the PPG signal as well as the accelerometer signal. Large error in HR estimates is evident from this figure.

Table 1 depicts the average absolute error (in beats per minute) in HR estimates obtained from the HSUM model, the HSUM model with 3-point median filtering compared with the HR estimates obtained from the STFT of the PPG signal, and using 3-point median filtering on HR estimates obtained from the STFT of the PPG signal. Also, the average absolute error (in BPM) using other four recently developed algorithms namely SpaMA (Salehizadeh et al. 2015), WFPV (Temko 2015), JOSS (Zhang 2015), and TROIKA (Zhang et al. 2015b) is also given for comparison. We can see that HSUM gives an improvement over these algorithms. On the other hand, Table 2 shows the standard deviation in error (in BPM) for HR estimates obtained from the HSUM model, HSUM model with 3-point median filtering compared with the HR estimates obtained from the STFT of the PPG signal, and using 3-point median filtering on HR estimates obtained from the STFT of the PPG signal.

Table 1 Average absolute error (in beats per minute) for HR estimates obtained from the HSUM model, the HSUM model with 3-point median filtering compared with the HR estimates obtained from the STFT of the PPG signal, and using 3-point median filtering on HR estimates obtained from the STFT of the PPG signal
Table 2 Standard deviation in error (in beats per minute) for HR estimates obtained from HSUM model, HSUM model with 3-point median filtering compared with the HR estimates obtained from the STFT of the PPG signal, and using 3-point median filtering on HR estimates obtained from the STFT of the PPG signal

The Bland-Altman plot (Bland and Altman 1986) for all 12 data-sets is shown in Fig. 10. The Bland-Altman plot was used for validating the agreement between the estimated HR values obtained using the HSUM model with the ground-truth HR obtained from the ECG signal (Bland and Altman 1986). The 95 percent limit of agreement (LOA) is [−6.7086, 6.7086] BPM in Bland-Altman plot (Bland and Altman 1986). Figure 11 shows the scatter plot of HR estimates with respect to the corresponding ground-truth. The Pearson’s coefficient between the HR estimates and the ground-truth is 0.9974. The heart rate estimated using the HSUM model is in agreement with the ground-truth as depicted by the Bland-Altman plot in Fig. 10. However, since the estimated HR, as well as ground-truth, are non-normal, Pearson’s correlation is not a suitable metric (Mukaka 2012). In this paper, we computed Spearman’s rank correlation between the HR estimates and the ground-truth as it is applicable for non-normal distributions and is robust to outliers unlike Pearson’s correlation (Spearman 1904). The Spearman’s rank correlation for all 12 data-sets comes out to be 0.9978 for the HSUM model that shows almost perfect agreement between the HR estimates and the ground-truth.

Fig. 10
figure 10

The Bland-Altman plot between HR estimates obtained using the HSUM model with 3-point median filtering and the ground-truth from the ECG signal for all 12 data-sets in IEEE SP cup data (Zhang et al. 2015b). Strong agreement in HR estimates with the ground-truths is evident. Most of the points are inside the limit of agreement that is at ±1.96 times the standard deviation (Bland and Altman 1986)

Fig. 11
figure 11

The Pearson’s correlation between HR estimates obtained using the HSUM model with 3-point median filtering and ground-truth from the ECG for all 12 data-sets in IEEE SP cup data (Zhang et al. 2015b). The Pearson’s correlation coefficient was 0.9974 and Spearman’s rank coefficient was 0.9978. High values of these coefficients suggest that the estimated heart rates were almost same as the ground-truth

5 Conclusions

We have developed a harmonic sum-based method for estimation of heart rate (HR) using a PPG signal that had been corrupted by motion-artifacts during running on a treadmill. An auxiliary accelerometer is used to acquire information about the physical movements of the user. The quasi-periodic accelerometer signal is first modeled using a harmonic sum (HSUM) that estimates the fundamental frequency of the acceleration signal over short time-windows. The fundamental frequency of the acceleration signal is then used to model the PPG signal containing information about the heart rate and the motion-artifacts. We extract the heart rate (HR) using a joint HSUM model for the PPG signal. The method is suitable for wearable devices such as PPG wristbands used for real-time HR monitoring.