1 Introduction

AS wearable electronic (biomedical) devices become popular nowadays, the photoplethysmogram (PPG) signal processing appears to be more and more important. PPG signals can be acquired using light-weighted and cost-effective pulse oximeters [1, 2]. A pulse oximetry relies on a light-emitting diode (LED) to illuminate the user’s skin and a photodetector to measure the intensity changes of the light penetrating through/reflected from (transmittance-type reflectance-type) the skin, which constitute the PPG signal. The transmittance-type oximeters, which can only be placed on a relatively static place such as the finger or the earlobe, are not ideal because they cannot be worn seamlessly and have limited detection capacity due to their placement requirement. On the other hand, the reflectance-type oximetry has the flexibility for placement on the body. However, this would often introduce motion artifacts. This paper will focus on the signal data acquired from the reflectance-type oximetry.

PPG signals are extremely sensitive to motion artifacts (MAs) caused by the subject’s movement [1]. In order to further process a PPG signal for biological measurements such as heart rates or blood oxygen levels, it is necessary to remove the MAs from the probed PPG signal.

Many existing research works have been dedicated to this problem. A major approach is the independent component analysis (ICA) [3, 4]. Time- or frequency-domain based ICA algorithms were employed to find the signal subspace for eliminating/mitigating the MAs. However, the required assumption that the desired signal and the MAs are completely uncorrelated is not valid, especially for PPG signals [5]. Therefore, the MA detection results are not satisfactory from the ICA approach. Besides, some ICA methods require multiple PPG sensors, which will introduce the user’s inconvenience and the extra cost for practical implementation.

Another commonly-adopted approach is the adaptive noise cancellation (ANC) [6,7,8]. In the ANC approach, one first needs to acquire the reference signal. Usually the reference signal can be estimated using the fast Fourier transform (FFT) or singular value decomposition (SVD). As a matter of fact, the ANC approach has a major drawback that its performance relies heavily on the quality of the estimated reference signal. When the estimated reference signal is not reliable, the ANC performance can be very poor.

Auxiliary data acquired from the acceleration meter can also be utilized for the MA removal. For example, a spectrum subtraction technique was proposed to suppress MAs in the spectral domain [9, 10]. However, these data (a.k.a. acceleration data) can reflect only the three-dimensional movements of the subject, but the MAs are often generated by the abrupt distance changes between the oximetry and the subject’s skin. Hence, the acceleration data alone cannot represent all kinds of MAs in PPG signals.

Recently, a method called TROIKA was proposed for tracking the heart rates when the subject was exercising [11]. The TROIKA method used the sparse signal reconstruction to create a high-resolution spectrum of the PPG signal. Then spectrum peak tracking (SPT) was performed. The TROIKA approach was shown effective for regular hand movements. However, the TROIKA performance was quite sensitive to the initialization of the SPT. Once the initialization of the SPT was in considerable error, the heart rate (HR) trajectory would be entirely off the target. The TROIKA approach is suitable for regular and frequent MAs caused by the subject’s running or respiration. Nevertheless, the abrupt or aperiodic MAs would often cause significant performance degradation of the TROIKA scheme.

In this paper, we propose a novel MA detection and removal mechanism based on the short-time variances across the PPG signal sequence. Our proposed new MA detector contains three steps, namely optimal frame-size selection, short-time variance feature-extraction, and change-point detection. Since the feature extraction is very sensitive to the frame size (short-time window length), a kurtosis measure is proposed to automatically select the best frame size. Then, the MA detection is performed by thresholding the extracted feature (short-time variance) sequence. Once the MA intervals are identified, the corresponding signal data are tailored from the original PPG signal sequence. Thus, the original temporal signal structure can be preserved unlike other existing spectrum-based techniques. The experimental results exhibit that our proposed scheme can successfully detect any abrupt MA in the PPG signal and therefore improve the HR tracking accuracy by removing the MAs.

The rest of this paper is organized as follows. Our proposed new MA tailoring algorithm is manifested in Section II. The experimental results from the real data are demonstrated and analyzed in Section III. Finally, concluding remarks will be drawn in Section IV.

2 Algorithm

Since the PPG signal is very sensitive to motion artifacts (MAs), as shown in Fig. 1, the detection and removal of such MAs become critical for any PPG signal processing and analysis.

Fig. 1
figure 1

An example of the effect of motion artifacts on the PPG signal. The top figure depicts an uncorrupted (clean) signal while the bottom figure exhibits a PPG signal contains motion artifacts occurring from the 35th to 45th seconds

In our proposed MA tailoring scheme, the MA detector contains three steps: optimal frame-size selection, short-time feature extraction, and change-point detection. Since the PPG signals vary across different subjects, it is impossible to choose an appropriate frame size (short-time window length) for the feature extraction across all different subjects. In other words, the frame size should be “data-dependent”. Here we propose a novel data-dependent frame-size determination mechanism for the algorithm to automatically find the optimal frame size thereby. A robust and reliable feature, namely the short-time variance, is adopted in this work for MA detection, because it is computationally simple. Finally, the change points (from the regular signal to the MA or vice versa) can be found by thresholding the short-time variances. After the MA intervals are spotted, the corresponding signal data can be tailored from the original signal waveform. The details of our proposed algorithm are presented in the following subsections.

2.1 Features: short-time variances

The short-time variance is utilized as the underlying feature for MA detection in this paper. The short-time variance, which depends on the frame size \( W \) and can be considered as a transformation from the original PPG signal sequence, i.e., \( x\left( n \right), n \in \mathbb{Z} \mathop \to \limits^{{\varGamma_{W,\varDelta }}} {V_{w,\varDelta }}\left( k \right), k \in \mathbb{Z} \) is defined as

$$ \begin{aligned} & V_{w,\varDelta } \left( k \right) \mathop = \limits^{\text{def}} \varGamma_{W,\varDelta } \left[ {x\left( n \right)} \right] \\ & = \frac{1}{W - 1}\sum\nolimits_{n = (k - 1)\varDelta + 1}^{(k - 1)\varDelta + W} {[x(n) - \mu ]^{2} } , \\ \end{aligned} $$
(1)

where \( n \) is the original signal sample index, \( {\mathbb{Z}} \) denotes the set of all integers, \( {\text{W}} \) is the frame size, \( \varDelta \) is the frame-forwarding size, \( k \) is the frame index, and \( \mu \) is the short-time mean of \( x\left( n \right),n \in \left( {k - 1} \right)\varDelta + 1,\left( {K - 1} \right)\varDelta + W] \) defined as

$$ \mu \mathop{=}\limits^{\text{def}} \frac{1}{W}\sum\nolimits_{n = (k - 1)\varDelta + 1}^{(k - 1)\varDelta + W} {x(n)} $$
(2)

In order to build a robust MA detector, the frame size \( W \) has to be carefully selected so that the variances \( V_{w,\varDelta } \left( k \right) \) are reliable for distinguishing regular signal data and motion artifacts.

2.2 Automatic frame-size determination

To accurately identify the change points in \( x\left( n \right) \), the fluctuations in the variance sequence \( V_{w,\varDelta } \left( k \right) \) should not be intermittent. According to [12], the larger the frame size, the fewer the spiky features would appear (the feature variations are all “smoothed out”). However, if the frame size gets too large, the resolution of a “detectable location” becomes rough. Hence, a good frame size should be able to mitigate the spikes throughout the entire signal sequence, while still maintaining the high resolution (compact frame size). Consequently, the smoothness requirement can be achieved by choosing a large frame size while the compact-duration requirement can be achieved by choosing a small frame size instead. An effective algorithm to seek the trade-off between these two objectives was proposed in [12]. In [12], a nonlinear program was facilitated to find the optimal frame size \( W \). The smoothness of the feature sequence \( V_{w,\varDelta } \left( k \right) \) is the objective function while the compact-duration requirement is the nonlinear constraint. The compact-duration requirement implies a steep-transitioned waveform of \( V_{w,\varDelta } \left( k \right) \). A kurtosis function \( {\rm K}\left( {V_{w,\varDelta } \left( k \right)} \right) \) was proposed by [12] to establish the constraint. According to [13], the kurtosis of the short-time variances \( V_{w,\varDelta } \left( k \right) \) is defined as

$$ {K}\left( {{V_{w,\varDelta }}\left( k \right)} \right)\mathop = \limits^{\text{def}} \frac{{\mathop \sum \nolimits_k {P_k}{{\left[ {\left( {K - 1} \right){\varDelta } + 1 - M} \right]}^4}}}{{{{\left\{ {\mathop \sum \nolimits_k {P_k}{{\left[ {\left( {K - 1} \right){\varDelta } + 1 - M} \right]}^2}} \right\}}^2}}}, $$
(3)

where \( P_{k} \) is the sequence which satisfies the probability axioms in [13]. It is defined as

$$ P_{k} \mathop = \limits^{\text{def}} \frac{{V_{w,\varDelta } \left( k \right)}}{{\mathop \sum \nolimits_{k} V_{w,\varDelta } \left( k \right)}}, $$
(4)

and the mean \( M \) is given by

$$ M\mathop = \limits^{\text{def}} \mathop \sum \limits_k \left[ {\left( {K - 1} \right){\varDelta } + 1} \right]{P_k}. $$
(5)

The kurtosis measure defined by Eq. (3) is proved to be \( \Delta \)-multiple-shift invariant according to [12]. Therefore, one can start a frame at any time to record the signal and it will not affect the outcome of the frame-size determination.

The optimal frame size \( W \) can be obtained from the following nonlinear program:

$$ W^{*} = {\text{argmax }}\left( W \right)\;{\text{subject}}\;{\text{to}} $$
$$ \frac{{\left| {K\left( {{V_{w,\varDelta }}\left( k \right)} \right) - {\rm K}\left( {{V_{2w,\varDelta }}\left( k \right)} \right)} \right|}}{{{\rm K}\left( {{V_{w,\varDelta }}\left( k \right)} \right)}} < \kappa , $$
(6)

where \( \kappa \) is a pre-determined upper bound for the kurtosis constraint function.

2.3 Change-point detection and MA removal

Once the variance features are successfully extracted from the original PPG signal using the automatic frame size determination according to Eq. (6), the detection of the MAs can be carried out by thresholding the short-time variance sequence. Given the predetermined threshold for the MA detection, say \( \tau \), the duration of an MA \( \left[ {p,q} \right] \) can be spotted as

$$ p = \left( {k_{s} - 1} \right)\varDelta + 1 , $$
(7)
$$ q = \left( {k_{e} - 1} \right)\varDelta + W^{*} , $$
(8)

where \( k_{s} \), \( k_{e} \) are the indices of the variance frames satisfying \( V_{{k_{s} }} < \tau \) and \( V_{{k_{e} }} < \tau \) while \( V_{{k_{s + 1} }} ,V_{{k_{s + 2} }} , \ldots ,V_{{k_{e - 2} }} ,V_{{k_{e - 1} }} > \tau \) is the optimal frame size resulting from Eq. (6). The threshold \( \tau \) can be determined by a calibration process over clean signal samples in the absence of MA prior to MA detection such that

$$ \tau = 2 \max_{j = 1,2, \ldots ,m} V_{w,\varDelta } \left( j \right), $$
(9)

where \( m \) is the number of frames a user selects for the calibration process. After the MA detection is carried out across the entire signal, we denote the collection of the detected change-points (from the regular signal to an MA and from an MA to the regular signal) by \( \left\{ {\left[ {p_{1} ,q_{1} } \right],\left[ {p_{2} ,q_{2} } \right], \ldots ,\left[ {p_{i} ,q_{i} } \right], \ldots } \right\} \). Thus we represent the \( i^{\text{th}} \) detected MA signal segment \( \phi_{i} \left( n \right) \) by

$$ \phi_{i} \left( n \right)\mathop = \limits^{\text{def}} \left\{ {\begin{array}{*{20}c} {\infty ,\quad p_{i} \le n \le q_{i} ,} \\ {0,\quad elsewhere} \\ \end{array} } \right.. $$
(10)

Note that when \( \phi_{i} \left( n \right) = \infty \), the corresponding signal sample carries “null” information. One can thus invoke \( \phi_{i} \left( n \right) \)‘s given by Eq. (10) to disable the MA internals in \( x\left( n \right) \). Thus, the “tailored” signal \( x'\left( n \right) \) is expressed as

$$ x^{\prime}\left( n \right)\mathop = \limits^{\text{def}} x\left( n \right) + \mathop \sum \limits_{i} \phi_{i} \left( n \right) $$
(11)

If we convert \( x\left( n \right) \) to \( x'\left( n \right) \), the original waveform structure of the PPG signal is preserved but other existing spectrum-based methods fail to do so. As a result, the tailored signal \( x'\left( n \right) \) can be further processed for the fast SpO2 (blood oxygen level) computation in addition to the HR tracking.

figure a

2.4 Summary of our proposed algorithm

Based on the discussion in the previous subsections, our proposed new MA tailoring algorithm can be summarized in Algorithm 1.

3 Evaluation

3.1 Experiment setup

The experimental data were acquired by a wearable reflectance-type oximetry sensor developed in-house. The PPG measurements were taken on the finger and the neck of the subject at the sampling frequency of 50 Hz. All experimental procedures were approved by the Institutional Review Board. There were three different subjects under test in total. The frame-forwarding size was chosen as \( {\Delta } = 2 \). The threshold for the optimal frame size determination was given by

$$ \kappa = 0.01. $$
(12)

The length of the signal used for calibration as previously stated is 5 s (\( {\text{m}} = 250 \)).

3.2 Experimental results

The short-time variances extracted under different frame sizes (\( W = 64,128, \;{\text{and}}\; 512 \)) are demonstrated in Fig. 2. From Fig. 2 one can observe that a proper frame size is the crucial factor for the successful MA detection. A small frame size can cause the variance feature sequence to be so spiky that many false positives would occur. On the other hand, a large frame size can over-smooth the feature sequence and make the detected MA intervals wider than they should be. The kurtosis measure optimization given by Eq. (6) leads to the optimal frame size \( W = 128 \) for this case, which coincides with our conclusion from Fig. 2. The effect of the frame size \( W \) on the number of change points corresponding to the detected MAs in comparison with the ground truth is shown in Fig. 3.

Fig. 2
figure 2

a The variance sequence \( V_{w,\varDelta } \left( k \right) \) for \( W = 64 \) and \( {\Delta } = 2 \). Note that \( V_{w,\varDelta } \left( k \right) \) is too spiky so the detector could encounter many false positives. b The variance sequence \( V_{w,\varDelta } \left( k \right) \) for \( W = 128 \) and \( {\Delta } = 2 \). Note that \( V_{w,\varDelta } \left( k \right) \) seems to satisfy both smoothness and compact-duration requirements. c The variance sequence \( V_{w,\varDelta } \left( k \right) \) for \( W = 512 \) and \( {\Delta } = 2 \). Note that the feature durations are not concentrated (compact) enough

Fig. 3
figure 3

The effect of the frame size on the number of detected change points. The red horizontal line illustrates the ground truth: 14 change points. (Color figure online)

One can see from Fig. 3 that the MA detection accuracy would not be good when the frame size is either too small or too large. This phenomenon coincides with the discussion in the section of Automatic Frame-Size Determination above. The complete MA detection result for the aforementioned PPG signal is shown in Fig. 4. Figure 4 demonstrates the effectiveness of our proposed new scheme for detecting abrupt MAs in the PPG signal.

Fig. 4
figure 4

The MA detection for a PPG signal. The PPG signal was measured on the fingertip of the subject when the subject moved the finger randomly. The red vertical lines indicate the starting times of the MAs while the green vertical lines manifest the terminal times of the MAs. (Color figure online)

As a potential application for our proposed MA tailoring algorithm, the heart-rate tracking was also performed thereupon. First, the MA corrupted signal was tailored using our method. Whenever the HR tracking algorithm (peak-to-peak time-difference estimation according to [12]) encounters the “tailored samples” with an infinity value, the HR measure will stop updating. The HR tracking error-percentages were computed for three different subjects, each calculated from around 10 min of data. Table 1 lists the corresponding results, where “Raw”, “Our Method”, and “TROIKA” denote the unprocessed raw signal data, the tailored signal using our proposed method, and the processed data using the TROIKA method in [11]. It is evident that our proposed new approach greatly outperforms the TROIKA method.

Table 1 The average heart-rate error-percentage (%) comparison

4 Conclusion

The detection and removal of the motion artifacts in a PPG signal is crucial for many signal processing applications. Existing techniques are often designed to deal with the MAs caused by the subject’s running or jogging. In this paper, a novel MA detection method built upon the automatic frame-size determination is proposed to tailor the PPG signals containing abrupt MAs. The proposed MA tailoring scheme has demonstrated the effectiveness via real experiments and can be applied for robust heart-rate or blood-oxygen-level measurements, since it can preserve the original waveform structure of the PPG signal but other existing methods fail to do so.