Keywords

1 Introduction

Continuous monitoring of blood pressure over long periods is essential to prevent critical cardiovascular events that can cause irreversible health damages and even death. The classical approach to measuring blood pressure uses pressure cuffs, requiring mechanical apparatuses which are undesired in long-term monitoring. Researchers have proposed alternative methods to measure blood pressure using pulse transit time (PTT), which does not need mechanical cuffs [1,2,3,4,5,6,7]. An extensive review of blood pressure monitoring theory and practice using PTT can be found in [8]. Researchers also found a strong correlation among PTT, the measurement of respiratory effort [9] and the detection of microarousals [9]. Other applications included the indication of cardiovascular changes during obstetric spinal anesthesia [10], myocardial performance [11], respiratory events [12] and hypertension detection [13, 14].

PTT is the duration that it takes for a pulse wave (PW) to travel between two arterial points [13]. It can be measured with different methods, such as arterial probes [15], electrocardiogram (ECG) or contactless photoplethysmography (PPG). ECG electrically records a graph of voltage versus time, which is typically acquired from electrodes attached to the chest or limbs whereas PPG optically records blood volume changes vs time.

While ECG may have suffered from measurement artefacts [16] such as electromagnetic interference or loose leads, it was still considered the gold standard for continuous heart rate monitoring [17]. Despite the high signal quality, its usability was limited by the requirement of leads attached to the skin. In comparison, optical PPG sensors were found to be the least invasive and therefore overall desirable. However, they were also sensitive to sensor pressure and artefacts caused by motion [18]. Pulse transit times can also be measured between different physical locations and temporal parts of the cardiac cycle. The cardiac cycle is a repeating series of pressure changes within the heart of living organisms and was often described to have several peaks, particularly the P wave, QRS complex and T wave [19]. No standardised PTT definition had been established due to various measurement possibilities regarding

  1. (i)

    different sensor types

  2. (ii)

    sensor or lead locations and

  3. (iii)

    temporal reference points.

Van Velzen, et al. identified 43 different methods to determine PTT [20] between an ECG R-peak and a PPG signal [21,22,23,24,25]. A PTT for ECG-PPG measurements is called Pulse Arrival Time (PAT). All reviewed methods [20] used anchor points to calculate the PAT (Fig. 1A).

Fig. 1.
figure 1

A: Existing PAT definitions, B: Proposed MLPTT PAT measurement

These anchor points could be the foot (or onset), peak or a percentage (e.g. 50%, 25%, etc.) of the pulse wave. PAT could be calculated by

$$\begin{aligned} PAT_{peak-peak}=t_{PPG_{peak}}(n)-t_{ECG_{R-peak}}(n), \end{aligned}$$
(1)
$$\begin{aligned} PAT_{peak-50\%peak}=t_{PPG_{50\%peak}}(n)-t_{ECG_{R-peak}}(n), \end{aligned}$$
(2)
$$\begin{aligned} PAT_{peak-foot}=t_{PPG_{foot}}(n)-t_{ECG_{R-peak}}(n), \end{aligned}$$
(3)

where n was the individual heartbeat. Many PPG sensor signal quality-related challenges were identified in previous research. Their readings were found to contain artefacts [18, 26] or malformed segments induced by sensor motion or attachment pressure variations [20]. To compensate for undesired signal anomalies, at least 19 of the 43 previously reviewed methods [20] used signal filtering for PTT calculated from anchor points. Although PTT is the duration that it takes for a pulse wave to travel between two arterial points, it was so far considered as the time between two selected points on two curves, with \(PAT_{peak-foot}\) and \(PAT_{peak-peak}\) most widely used. Table 1 provides a summary of previously used anchor points in literature.

Table 1. Anchor point prevalence in the literature [20].

Machine learning and particularly neural networks had shown great potential at extracting spatial information from data [27]. We hypothesized that while filtering had refined PTT measurements, accuracy could be improved further by not considering an individual point on the curve but the shift of the shape of the curve using all points. We therefore propose Machine Learned Pulse Transit Time (MLPTT), which was trained on a sliding frame to detect curve properties relative to a virtual anchor point (Fig. 1B). The proposed PAT measurement could be defined with the following equation

$$\begin{aligned} PAT_{peak-MLPTT}=t_{PPG_{virtual\,anchor}}(n)-t_{ECG_{R-peak}}(n), \end{aligned}$$
(4)

where n was an individual heartbeat. PAT is a particular type of PTT and was used to validate the proposed method by comparing the predicted PAT with PAT calculated by HeartPy, a toolkit that was designed to handle (noisy) PPG data [28, 29]. While the advantages of PAT are that R-peaks of the cardiovascular QRS complex can be detected with numerous algorithms [30] such as Pan-Tompkins [31], it requires conductive electrodes attached to the skin in selected locations and is, therefore, more disruptive than PPG, which measures optically. Gao et al. compared the pulse transit time estimates of PAT and PPG-PTT with invasive I-PTT using arterial probes as a reference [32]. They concluded that PPG-PTT correlated well with all blood pressure levels. To validate MLPTT, the PAT of three BIDMC PPG and Respiration Dataset [33] patients was calculated with HeartPy and MLPTT. Both methods provided independent results from each other, the overall validation workflow is shown in Fig. 2.

Fig. 2.
figure 2

End-to-end PTT validation workflow (1: ECG, 2: finger PPG).

Contribution: We propose a machine learning based PTT calculation method and show it is more accurate than the existing signal processing approaches. The specific contributions can be summarized as follows:

  1. a)

    We propose a new noise resilient machine learning model to extract events from PPG and demonstrate it particularly on PAT extraction.

  2. b)

    Results from a study showing accuracy over state of the art (i.e. HeartPy) with statistical significance.

  3. c)

    We show that MLPTT outperforms HeartPy peak detection especially for noisy photoplethysmogram data.

  4. d)

    We discuss and evaluate the results.

2 Design

Pulse transit time was previously measured between two anchor points on ECG, PPT or arterial probe pressure signals [32]. Since all these signals are quasi-periodic, we hypothesized that PTT can be considered signal phase shift per heartbeat period. While phase shift can be calculated between the same anchor point on different signals, we propose to use machine learning to consider all points on the signal to estimate phase shift and therefore PTT.

2.1 Machine Learned Pulse Transit Time

The idea to consider all points on the signal was implemented in MLPTT. MLPTT consisted of a sequence of 4 processes: frame segmentation, waveform binary classifier, frame segmentation, anchor point classifier (Fig. 3: 3, 4, 7, 8).

Fig. 3.
figure 3

MLPTT and HeartPy process diagram

MLPTT Frame Segmentation for Binary Classification. PPG and ECG data were loaded from a dataset (Fig. 3: 1, 2). To avoid discontinuities in the time series data, 70% training and 30% test ratio were used for all patients instead of k-fold verification. The first two MLPTT processes (Fig. 3: 3 and 4) aimed at finding a periodic pattern in the PPG signal. This was achieved by using the known periods from ECG signals R-peaks, detected with the Pan-Tompkins algorithm (Fig. 3: 5 and [31]), as virtual anchor point labels. Frames of 60 samples each were created by shifting the frame in increments of one sample. In the case of the BIDMC dataset sampled 125 Hz, one sample was 8ms long. Each frame was labelled respective containing a virtual anchor or not.

Waveform Binary Classifier. The segmented and by anchor points labelled frames were subsequently used as training input for the ML waveform binary classifier (Fig. 4A: 4).

Fig. 4.
figure 4

A Binary waveform classifier. B Virtual anchor multilabel anchor position classifier.

The goal of this step was to train a classifier that could detect quasi-periodic waveforms in longer PPG time series which contained one virtual anchor. This could also be considered a heart rate classification based on PPG data, with the detected consecutive virtual anchors manifested as one heartbeat each. Settings for Python Scikit-learn 0.22.1 KNN, SVC, Gaussian process, decision tree, random forest, MLP, AdaBoost and Gaussian process Naïve Bayes classifiers were grid searched with Gaussian Naïve Bayes achieving the highest classification accuracy with default settings. The trained classifier model was then used for binary prediction of all frames, either containing a virtual anchor or not.

Frame Segmentation for Anchor Point Classification. The frames predicted to contain virtual anchors were then automatically selected for a second classifier (Fig. 3: 7). Frames predicted to contain no virtual anchors were not evaluated further.

Multilabel Anchor Position Classifier. As the final step of the proposed MLPTT method, the anchor position in the selected frames was predicted. This classifier was trained on ECG derived peaks for PPG training data. In addition to the classifiers evaluated for the binary classification, a 5-layer sequential TensorFlow model was developed. The model consisted of 120 neurons for the input layer and 60 for the output. After grid searching hyperparameters for this model, it outperformed the baseline Scikit-learn MLP mode and was used for the prediction results in the following chapters. The trained MLPTT was then used to predict \(PAT(MLPTT)_{peak-virtual\_anchor}\) by predicting the virtual anchors for test data extracted from the BIDMC dataset, which were then downsampled to one anchor per period.

2.2 HeartPy Pulse Transit Time

HeartPy is a toolkit designed to handle noisy PPG data and was used to detect peaks in PPG signals. For three out of four tested PAT definitions, Rajala et al. reported the smallest relative error for \(PAT_{peak-peak}\) [34] and this definition was found to be the second most widely used in a literature review of 43 published papers [20]. Therefore, \(PAT(HeartPy)_{peak-peak}\) was calculated by HeartPy. As ECG reference, the same Pan-Tompkins detected peaks were used as for MLPTT. PPG peaks were detected with HeartPy using following settings: \(sample\_rate = 125\), \(hampel\_correct = False\), \(high\_precision\_fs=1000\).

3 Analysis

In this section, we show how the proposed method was compared to the established toolkit.

3.1 Methods

For PTT and PAT, there is no ground truth readily available. We therefore initially compared MLPTT PAT with HeartPy PAT as a reference, to disprove the null hypothesis that their correlation is not statistically relevant. PAT was used instead of PPG PTT because ECG R-peak detection accuracy was found to be higher than for PPG, ECG R-peak detection has been standard practice and for some populations, comparatively better signal to noise ratios had been observed [35] as well as lower morphological variance [36]. Because our eventual goal was to measure PTT and therefore phase shift between signals, we considered only the AC component of the time series PTT estimations for comparison. Accordingly, the PAT of 196–240 ECG-detected heartbeats was calculated for 5 patients with HeartPy and MLPTT.

We then disproved that there is no statistically significant correlation between both curves. If the following relationship was true

$$\begin{aligned} \sum _{k=0}^{n} MLPTT_{PAT}(hb_k) \not = \sum _{k=0}^{n} HeartPy_{PAT}(hb_k), \end{aligned}$$
(5)

where \(hb_k\) were the heartbeats of each patient from the beginning of the test data (\(k=0\)) to the last heartbeat (\(k=n\)), MLPTT PAT and HeartPy PAT were not correlated. If they were linearly correlated, plotting \(x =MLPTT PAT(hb_k)\) and \(y = HeartPy PAT(hb_k)\) would have formed a straight line. The correlation was to be proven with Pearson correlation and its respective p-value <0.05.

4 Results

We measured performance by showing a linear relation between our proposed method PAT results and HeartPy PAT results and plotted resulting PAT curves plotted on top of each other with the respective difference filled (Fig. 5).

Fig. 5.
figure 5

HeartPy and MLPTT AC components, difference filled in red. (Color figure online)

Out of 5 tested patients (BIDMC patients 2, 6, 8, 42, 53), HeartPy failed to detect exactly one PPG peak per ECG period for patients 2 and 53, in some instances detecting 0 and in some instances 2. These patients were removed and only patients 6, 8 and 42 used for comparison. The Pearson correlation between HeartPy PAT and MLPTT PAT was the following (Table 2).

Table 2. MLPTT PAT - HeartPy PAT Pearson correlation and p-value.

F1 score and RMSE were calculated between the ECG signals R-peaks and the MLPTT detected virtual anchor from PPG. The predicted R-peak from PPG data was used as a reference to benchmark the stability of the method, but it should not be considered as ground truth, which is indeterminable for PPG with current methods and can only be approximated intravenously [32]. For patients 2–42 the F1 score was in the range of 0.15–0.30 with RMSE extending between 3.83 and 8.01 samples (Table 3). Patient 53 showed the lowest accuracy with an F1 score of 0.04 and a RMSE of 15.66 samples. Support was the number of analysed samples in the time series, 11661 to 13278 samples for patient 2–42 and 9014 for patient 53.

Table 3. MLPTT PAT - HeartPy PAT F1 score, RMSE and support.

4.1 BIDMC Patient 6 Drilldown

These were the MLPTT subprocess results sampled from one out of the 3/5 BIDMC patients of whom HeartPy detected all PPG peaks.

HeartPy and MLPTT PAT Patient 6 Pearson Correlation. A scatterplot with HeartPy calculated PAT on the x-axis and MLPTT PAT on the y-axis was produced (Fig. 6). A histogram on top of each axis showed the respective PAT distribution. The Pearson correlation coefficient p was listed for all tested patients in Table 2.

Fig. 6.
figure 6

HeartPy MLPTT Pearson correlation.

Table 4. Waveform binary classifier results for BIDMC patient 6.

Waveform Binary Classifier. The waveform binary classifier achieved at an overall anchor prediction accuracy of 97% for 17977 frames of 60 samples each. Precision, recall, f-1 score and support were listed in Table 4. Precision was the ratio of correctly positively predicted virtual anchors to all positive virtual anchors (\(Precision = TP/TP+FP\), where TP was True Positive and FP was False Positive). The calculated precision was 0.92 for frames without anchors and 0.99 for frames with anchors. Recall, the ratio of correctly predicted virtual anchors to all virtual anchors in the class was 0.99 for frames without anchor and 0.95 for frames containing a virtual anchor (\(Recall = TP/TP+FN\), where TP was True Positive and FN False Negative). Support was the number of samples in the respective class (Table 4).

Multilabel Anchor Position Classifier. The multilabel anchor position classifier achieved an overall prediction accuracy of 30% (Table 3), with most predictions scattered closely around a confusion matrix diagonal.

5 Discussion

PTT is an important measure that can be used in many clinical applications and many researchers tried to measure it precisely. The goal of this study was to assess if the accuracy of established PAT and PTT methods that relied on signal filtering and fixed specific points can be matched and potentially exceeded. We addressed this challenge by creating a novel approach that used machine learning to find quasi-periodic patterns in PPG signals based on a series of samples.

For five tested BIDMC dataset patients, HeartPy did not detect one PPG peak for every ECG peak. MLPTT was more robust and correctly classified the presence of a quasi-periodic waveform in 97% of all frames for patient 6. Since the frame was shifted in increments of one sample through the entire signal, the classifier would predict 125 frames for one heartbeat at a heart rate of 60bpm and a sample rate 125 Hz. We expected the increased accuracy to be driven by at least three factors, the first was MLPTT’s ability to learn malformed PPG shapes of any form as long as they were contained in the training sequence. This could be particularly helpful for PPG-PPG PTT where it is desired to measure the signal phase shift accurately. The second factor was that previous PTT measurements were dependent on high signal quality at a specific location during each period. For example when measuring the peak-foot PAT, a pronounced foot would have to occur in the PPG signal which could be corrupted by motion artefacts or other noise. If the peak was still in its true position for a period with a corrupted foot, HeartPy would not benefit while MLPTT could learn and still predict the correct virtual anchor point. The third expected reason was that MLPTT was stepping through the entire signal in frames of 1 sample interval. Therefore, the multilabel classifier had the chance to predict every virtual anchor point for the number of intervals in one frame, 60 times in our implementation.

The calculated HeartPy- and MLPTT PAT correlated with statistical significance for in total 637 tested heartbeats of BIDMC patients 6, 8 and 42. Although PAT for patients 2 and 53 could not be compared due to HeartPy’s PPG peak detection inconsistencies compared to ECG, the MLPTT PAT prediction did not show any significant variations in the AC component that could have been caused by anchor point misdetection. MLPTT showed a pronounced confusion matrix diagonal for the prediction of virtual anchors. Overall, MLPTT correlated with HeartPy with statistical significance for the tested dataset, required no adjustments for individual patients and showed more robust PAT measurements for patients with noisy measurements.

5.1 Limitations

Despite the strong Pearson correlation with HeartPy PAT measurements, the method and current implementation are not without limitations. The behaviour for medically relevant outliers was not tested and patients’ background was not investigated. Some BIDMC signals also showed apparent sawtooth and potentially sine type noise. Lin et al. reported a sawtooth pattern in a different PPG dataset [37], which indicates that this type of noise may be prevalent in PPG recordings of medical equipment. Furthermore, despite our attempts to find the ideal parameters for HeartPy, HeartPy could offer additional built-in signal filters that we did not use. These might have improved HeartPy performance further. No custom filtering was applied for MLPTT for different patients.

Theoretical limitations included that the tested implementation with a frame length of 0...60 samples allowed for a maximum of 125bpm heart rate. Exceeding this heart rate would have led to more than one period per frame, which the implementation was not designed to handle in its first revision. This could be addressed in further revisions. Another theoretical limitation were edge cases in which the virtual anchor point is at either edge of the frame. It can be expected that these frames were more difficult to classify, which resulted in a slight decrease in classification accuracy for the outermost classes. This could be mitigated in further revisions by not considering those frames and only processing frames with a predicted minimum distance from the outer frame limits. Foremost, although both tested methods showed a statistically significant correlation, there were no readily available ground truth measurements for PAT and PTT. Invasive arterial probe measurements were found to produce the smallest errors for blood pressure prediction [32] and can be considered to be more reliable due to measuring the arterial pressure directly. Testing PPG-PTT against I-PTT would be more meaningful, but no dataset was available at the time of writing. We recognize the risks that incorrect PTT measurements may cause if used in clinical applications. All PTT measurements based on the new method should be revalidated against methods such as I-PTT with statistical significance before clinical deployment.

6 Conclusion

The key contribution of this paper is to provide a novel approach where global features (all samples) of the time-series data are used to develop a machine learning model to extract local pulse events. We evaluated the performance of MLPTT for more than 50000 samples of a reference dataset and validated the performance in comparison to a reference method for over 1000 heartbeats of 5 patients. The analyses show that MLPTT copes significantly better with inherently noisy PPG data than the reference method. The proposed technique is suitable for analysis of other medical recordings and for application in many other domains that rely on time series data.