Abstract
The state of the art for monitoring hypertension relies on measuring blood pressure (BP) using uncomfortable cuff-based devices. Hence, for increased adherence in monitoring, a better way of measuring BP is needed. That could be achieved through comfortable wearables that contain photoplethysmography (PPG) sensors. There have been several studies showing the possibility of statistically estimating systolic and diastolic BP (SBP/DBP) from PPG signals. However, they are either based on measurements of healthy subjects or on patients on (ICUs). Thus, there is a lack of studies with patients out of the normal range of BP and with daily life monitoring out of the ICUs. To address this, we created a dataset (HYPE) composed of data from hypertensive subjects that executed a stress test and had 24-h monitoring. We then trained and compared machine learning (ML) models to predict BP. We evaluated handcrafted feature extraction approaches vs image representation ones and compared different ML algorithms for both. Moreover, in order to evaluate the models in a different scenario, we used an openly available set from a stress test with healthy subjects (EVAL). The best results for our HYPE dataset were in the stress test and had a mean absolute error (MAE) in mmHg of 8.79 \((\pm 3.17)\) for SBP and 6.37 \((\pm 2.62)\) for DBP; for our EVAL dataset it was 14.74 \((\pm 4.06)\) and 7.12 \((\pm 2.32)\) respectively. Although having tested a range of signal processing and ML techniques, we were not able to reproduce the small error ranges claimed in the literature. The mixed results suggest a need for more comparative studies with subjects out of the intensive care and across all ranges of blood pressure. Until then, the clinical relevance of PPG-based predictions in daily life should remain an open question.
A. M. Sasso and S. Datta—The two authors contributed equally to this paper.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
According to the Global Disease Burden (GBD) study, high blood pressure (BP) (i. e. hypertension) is the risk factor that leads to more deaths worldwide [16]. The standard way of monitoring this condition is through the measurement of BP using an uncomfortable cuff-based device [25]. Fortunately, comfortable and common wearables can already detect changes in the flow of blood through a photoplethysmography (PPG) sensor [1]. The PPG signal (photoplethysmogram) obtained from it is already used with success to estimate heart rate [19] and, has the potential to go beyond that into accurate BP prediction [2, 5].
Most of the work in this area focus on building predictive models for patients in intensive care units (ICUs) [12, 20, 24]. However, data collected from regular life contain motion artefacts that are not observed in intensive care. Additionally, models that work on healthy populations [17, 18] should also be validated on hypertensive populations for guarantying their applicability in BP monitoring. Hence, in our work we focused on assembling a dataset containing data from subjects with hypertension (HYPE) during a stress test and 24-h monitoring.
We then evaluated machine learning (ML) models for predicting BP from PPG in the HYPE dataset and also in a dataset from healthy subjects during a stress test (EVAL). From the PPG signals, we extracted features from the time domain plus their image representations. Errors as low as the ones in the literature—for patients in the ICU or healthy subjects—could not be reproduced, even after processing the PPG signals with diverse time windows and filters.
This work is detailed as follows: Sect. 2 shows previous work in the field and Sect. 3 describes the datasets and methods we used to predict BP from the PPG signal. In Sect. 4 we convey our findings and results, followed by a discussion in Sect. 5 and Sect. 6 describing the implications of this work.
2 Related Work
Existing work focuses on predictive models using MIMIC [7], a dataset that contains physiological signals including PPG and ambulatory BP (ABP) from patients in ICUs. Kurylayak et al. [12] and Wong et al. [24] have both applied artificial neural networks (ANN) to predicted BP in this dataset and reported success. However, they used unknown or small sample sizes as can be seen in Table 1. Moreover, Kurylayak et al. only extracted time domain features from the PPG signal while Wong et al. also extracted frequency domain ones. Conversely, Slapničar et al. [20] tried a spectro-temporal ResNet with all features in a larger sample size but could not report the same success as his predecessors.
Others have tried to collect data from healthy subjects in daily life such as Lustrek et al. [17]. They have used the device empatica E4Footnote 1 and evaluated a range of machine learning (ML) techniques, achieving the best results with an ensemble of regression trees and the leave-one-subject-out (LOSO) validation strategy. However, they had to use ground truth BP from each subject to personalize the algorithm. Lastly, there is the work of Manamperi et al. [18], in which they evaluated ANN in MIMIC and in a set with data from voluntary subjects (assumed as healthy). They claim to have done the second evaluation in a non-clinical scenario, but the subjects were mainly at rest in their experiment.
Therefore, the current state-of-the-art does not give yet conclusions about the use of PPG to predict BP in diverse populations and in daily life. There is a clear need for more comparative studies both with healthy and hypertensive subjects and in different scenarios, especially outside of controlled conditions.
3 Methods
3.1 Datasets
In our work we used two datasets: one created by us with data from a hypertensive population (HYPE) and one that is openly available containing data from a healthy population (EVAL). It should be noted that both datasets recorded patients during a stress test and HYPE also during 24-h monitoring. We describe the two datasets below.
-
HYPE. This dataset was created by us as part of the CardioVeg study (NCT03901183) approved by the Ethics Committee from Charité, Berlin (no. EA4/025/19). Data was collected from 12 subjects (6 female) in the age range of 31–75 (median 60) that had hypertension. The study collected data from a (1) stress test and from (2) 24-h monitoring using the empatica E4 wristband as the PPG source and the Spacelabs (SL 90217) BP monitor. (1) Stress Test. The subjects followed a protocol in which they watched a relaxing video for 5 min then had their BP taken by a physician five times with an interval of 1 min per measurement [25]. Then, the patients biked in an ergonomic bike from 5 to 10 min and relaxed again. During the second relaxation phase their BP was measured again 5 times with a 1 min interval. This dataset contains a total of 95 BP recordings. One subject could not bike due to extreme high BP and another one had a failure in the wearable device. Therefore, this experiment had 10 subjects (5 female). (2) 24 H. In this phase, the same subjects from the stress test were monitored for 24-h during regular day activities. The Spacelabs monitoring device was configured to measure BP every 30 min during the day and every hour during the night. This dataset contains a total of 464 BP recordings and all 12 subjects were measured.
-
EVAL. This dataset was generated by Esmaili et al. [3]. The original paper tried to estimate BP based on pulse transit time (PTT) and pulse arrival time (PAT). Both variables are derived from the differences between the PPG and ECG signals. This data was collected from 26 healthy subjects in the age range of 21–50 years. The subjects were required to run for 3 min at the speed of 8 km/h to induce perturbations in their BP values. Directly after the exercise the subjects were made to sit upright and BP values were measured along with PPG and ECG. A force-sensing resistor (FSR) was used under the BP monitor cuff to measure the instantaneous cuff pressure. With the FSR it was possible to pin point the exact time when the SBP and DBP were measured. A total of 152 BP values were recorded in this dataset.
3.2 Handcrafted Feature Extraction Methods
Our first approach entailed extracting handcrafted features from the PPG signal. Time windows of 15, 30 and 45 s around the BP measurement were used for our experiments. To eliminate motion artefacts induced by wrist movements sections in which the Euclidean norm of x-, y- and z-acceleration lied outside of an interval of 25% of the standard deviation around the sample mean, for the current window, were removed from consideration. The motion removal was only done for the HYPE dataset as the EVAL dataset did not contain any motion signals corresponding to the PPG recording. We also experimented with signal normalization and filters such as Chebyshev II and Butterworth, since they were reported as the best filters for PPG signals [15]. For the processed signal, the PPG cycles were then identified with a standard peak detection function.
All detected cycles in the same window were combined into a custom PPG signal template (details in Sect. B), following a procedure described by Li and Clifford [13]. Individual cycles were then compared with the template using two signal quality indices (SQI): (1) direct linear correlation and (2) direct linear correlation between the cycle, re-sampled to match the template length, and the template itself. Only if both correlations lied above 0.8, the cycle was further processed to extract features. This resulted in some BP intervals not having any features extracted since no cycles matching the template were identified.
After the clean PPG cycles have been identified, time domain features were extracted and the detailed list can be found in the Appendix (Table 4). The first step was to identify the first peak in the cycle, which corresponded to the systolic peak. Then for various percentages of the peak amplitude, we extracted the time between systolic peak and end of the cycle (\(DW_n\)), start of the cycle and end of the cycle (\(SW_n+DW_n\)), and the ratio between the time in the cycle before and after the systolic peak (\(DW_n/SW_n\)). For every window, the mean and variance of each feature were computed and used as input for the models.
3.3 Image Representation Methods
An alternative approach to the manual feature extraction has recently gained much popularity involving convolutional neural networks (CNNs). The approach is to represent the waves as images and then use a transfer learning method based on pretrained CNNs to learn embedding from the images and use them to predict BP. The two different image-form representations of PPG signals that we tested were spectrograms and scalograms, described below.
-
Scalograms. A scalogram is usually plotted as a graph of time and frequency and it represents the absolute value of the Continuous Wavelet Transform (CWT) coefficients of a signal. The scalogram-CNN based approach was first discussed in Liang et al. [14]. However, it was only evaluated for hypertension stratification, not BP prediction. Before passing the signal to the CWT, we detrended it, i.e. subtracted the mean value from the input signal. CWT is a convolution of the input data sequence with a set of functions generated by the base wavelet. We used the complex Morlet wavelet function as the base wavelet, which is given by:
$$\begin{aligned} \varPsi (t) = \frac{1}{\sqrt{\pi B}}exp^{-\frac{t^2}{B}}exp^{2\pi Ct} \end{aligned}$$(1)The value of bandwidth frequency (B) and center frequency (C) was chosen to be 3 and 60 in the above equation, following the work of Liang et al. [14]. Compared to a spectrogram, a scalogram is usually better at identifying the low-frequency or fast-changing frequency component of the signal.
-
Spectrograms. A spectrogram displays changes in the frequencies in a signal over time. A third dimension indicating the amplitude of a particular frequency at a particular time is represented by the intensity or color of each point in the plot. The spectrogram-CNN approach to predict BP was first discussed in Slapničar et al. [20]. Similar to scalograms, we detrended our signal before generating the spectrogram plots. To generate a spectrogram, digitally sampled signals in the time domain are broken up into windows, which usually overlap, and they are Fourier transformed to calculate the magnitude of the frequency spectrum for each window [21].
Figure 1 depicts a sample spectrogram and scalogram generated from a PPG snippet of 15 s. The image representations of the signal were then fed into a ResNet architecture to learn the image embeddings [9]. The Residual Network or ResNet design enables us to train very deep neural networks without running into the vanishing gradient problem. Since our datasize is very small, instead of training a network from scratch, we decided to take a network which was already trained on the ImageNet dataset [8]. In particular, we used the ResNet18 architecture and took the embeddings from the penultimate layer of the network. We also experimented with Alexnet, but Resnet18 always performed marginally better [11]. This might be due to the fact that the penultimate layer of the Resnet18 generates a 512 length embedding, whereas the AlexNet generates a embedding of length 4096. The larger size of the input vector, in spite of using feature selection and strong regularization techniques, might make it challenging for the models to learn from, due to the small data size.
3.4 Machine Learning Models
Previous works show that machine learning algorithms perform well in predicting BP from features derived from PPG and/or ECG. We have employed in our experiments three popular machine learning algorithms: (a) Generalised Linear Models (GLM) with Elastic Net regularisation [26], (b) Gradient Boosting Machines (GBM) [4], and (c) a recent more efficient implementation of GBM called LightGBM (LGBM) [10] to predict the systolic and diastolic BP.
For prediction from the image embeddings, we used a Recursive Feature Elimination (RFE) technique with a support vector machine (SVM) with linear kernel as the base estimator, before pushing the vectors into the models.
3.5 Experimental Settings
In order to train models that are robust and well generalizable, we used a leave-2-subjects-out cross validation for all models, i.e. at every iteration we use data from 2 subjects as the test set, trained our models on the remaining data and repeated this procedure till all subjects have been at some point used as the test set. All hyper parameters were optimized empirically. We evaluated the models based on the mean absolute error (MAE). The MAE was calculated at each iteration and we calculated the mean and standard deviation of these values.
4 Results
In this section we report our experimental results. In Table 2 and Table 3 we show the comparison of the MAEs for predicting systolic blood pressure (SBP) and diastolic blood pressure (DBP) respectively, between all models in the different datasets. The cells in these table contain the mean and standard deviation (in parenthesis) of the MAE of all cross validation folds. Noticeably, in the HYPE dataset feature extraction methods consistently outperformed the image based methods. In EVAL the spectrogram-representation method outperformed the other two approaches. In both datasets, the best results for the spectrogram based approach are usually marginally better than the best results of the scalogram based approach. For the image based methods the more advanced machine learning models such as LGBM and GBM clearly outperformed the GLM model. This is most probably due to the comparatively large dimension of the input image embeddings. For the feature extraction based methods this difference is not so prominent, and in some cases the GLM turns out to be the best performing model. In general, based on the MAE values, predicting SBP appears to be more difficult than DBP which is consistent with previous literature (see Table 1).
5 Discussion
5.1 Clinical Relevance
Cuff-less and continuous methods of measuring BP are particularly attractive as BP is one of the most important predictors of long term cardiovascular health [6]. Prediction models for BP based on PPG signals can be a very important stride in that direction. But for reliable continuous monitoring of BP, these models need to perform well during regular day-to-day activities and also for different patient populations. Apart from MIMIC there does exist a few other studies that try to collect PPG and corresponding BP signals following a strict protocol (similar to HYPE: stress-test). To the best of our knowledge, our 24-h dataset is the first attempt to collect this data from an uncontrolled environment where the subjects were free to do anything. Our evaluations do underline that it is indeed more challenging to accurately predict BP in such an uncontrolled environment.
5.2 Technical Relevance
In this work we impartially evaluated different models and approaches for BP prediction from PPG. Most of the methods have only been previously validated on MIMIC. Also, due to the large volume of existing work, very often the models were not compared against all available approaches. To the best of our knowledge, this is also the first work to compare the scalogram, spectrogram and feature extraction approaches. We employ strong cross validation methods to make sure our results are robust. Our models and code are available openly to make sure this results can be reproduced and also applied to similar datasets when needed.
5.3 Limitations and Future Work
The major limitation of our work is related to the small size of the datasets we used. For that reason, it was not possible to train a deep Long Short Term Memory (LSTM) network, which in a few recent papers have demonstrated very promising results [23]. In future work, we would like to extend our dataset with more diverse patient populations and also with a longer observation period per patient. We might consider data augmentation techniques as well. This will allow us to apply more data-demanding learning algorithms and, at the same time, to investigate how models trained in one population perform in a different one.
6 Conclusion
In conclusion, we presented a comprehensive comparison of different machine learning approaches to predict BP from PPG in two different datasets. We demonstrate that despite the plethora of work in this area, there exists a dearth of models that perform well in uncontrolled environments when the subjects indulge in various day-to-day activities. To achieve a MAE (\(\le \)5 mmHg), which is considered good by the Association for the Advancement of Medical Instrumentation\(^{\textregistered }\) (AAMI) [22] we still have a long way to go. Moreover, we showed that for small to medium sized datasets feature extraction methods can produce better results than the recent image based approaches. We hope our work will inspire others to dig deeper into the generalizability and improve the accuracy of these models.
Change history
10 November 2021
The original version of this chapter was revised. The conclusion section was corrected and reference was added.
References
Challoner, A.V., Ramsay, C.A.: A photoelectric plethysmograph for the measurement of cutaneous blood flow. Phys. Med. Biol. 19(3), 317–328 (1974). https://doi.org/10.1088/0031-9155/19/3/003
Elgendi, M., et al.: The use of photoplethysmography for assessing hypertension. NPJ Digit. Med. 2(1), 60 (2019). https://doi.org/10.1038/s41746-019-0136-7
Esmaili, A., Kachuee, M., Shabany, M.: Nonlinear cuffless blood pressure estimation of healthy subjects using pulse transit time and arrival time. IEEE Trans. Instrum. Meas. 66(12), 3299–3308 (2017)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Ghamari, M.: A review on wearable photoplethysmography sensors and their potential future applications in health care. Int. J. Biosens. Bioelectron. 4(4), 195–202 (2018)
Gholamhosseini, H., Meintjes, A., Baig, M.M., Lindén, M.: Smartphone-based continuous blood pressure measurement using pulse transit time. In: pHealth, pp. 84–89 (2016)
Goldberger, A.L., et al.: PhysioBank, physioToolkit, and physioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kurylyak, Y., Lamonaca, F., Grimaldi, D.: A neural network-based method for continuous blood pressure estimation from a PPG signal. In: Conference Record - IEEE Instrumentation and Measurement Technology Conference, pp. 280–283. IEEE, May 2013. https://doi.org/10.1109/I2MTC.2013.6555424
Li, Q., Clifford, G.D.: Dynamic time warping and machine learning for signal quality assessment of pulsatile signals. Physiol. Meas. 33(9), 1491 (2012)
Liang, Y., Chen, Z., Ward, R., Elgendi, M.: Photoplethysmography and deep learning: enhancing hypertension risk stratification. Biosensors 8(4), 101 (2018)
Liang, Y., Elgendi, M., Chen, Z., Ward, R.: Analysis: an optimal filter for short photoplethysmogram signals. Sci. Data 5, 1–12 (2018). https://doi.org/10.1038/sdata.2018.76
Lim, S.S., Vos, T., Flaxman, A.D., et al.: A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet 380(9859), 2224–2260 (2012). https://doi.org/10.1016/S0140-6736(12)61766-8
Luštrek, M., Slapničar, G.: Blood pressure estimation with a wristband optical sensor. In: UbiComp, pp. 758–761 (2018). https://doi.org/10.1145/3267305.3267708
Manamperi, B., Chitraranjan, C.: A robust neural network-based method to estimate arterial blood pressure using photoplethysmography. In: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 681–685. IEEE, October 2019. https://doi.org/10.1109/BIBE.2019.00128
Shcherbina, A., et al.: Accuracy in wrist-worn, sensor-based measurements of heart rate and energy expenditure in a diverse cohort. J. Pers. Med. 7(2), 3 (2017). https://doi.org/10.3390/jpm7020003
Slapničar, G., Mlakar, N., Luštrek, M.: Blood pressure estimation from photoplethysmogram using a spectro-temporal deep neural network. Sensors (Switz.) 19(15) (2019). https://doi.org/10.3390/s19153420
Smith, J.O., III.: Mathematics of the Discrete Fourier Transform (DFT): With Audio Applications, 2nd edn. Booksurge, Charleston (2007)
Stergiou, G.S., et al.: A universal standard for the validation of blood pressure measuring devices. Hypertension 71(3), 368–374 (2018). https://doi.org/10.1161/HYPERTENSIONAHA.117.10237
Su, P., Ding, X.R., Zhang, Y.T., Liu, J., Miao, F., Zhao, N.: Long-term blood pressure prediction with deep recurrent neural networks. In: 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 323–328. IEEE (2018)
Wang, L., Zhou, W., Xing, Y., Zhou, X.: A novel neural network model for blood pressure estimation using photoplethesmography without electrocardiogram. J. Healthc. Eng. (2018). https://doi.org/10.1155/2018/7804243
Whelton, P.K., Carey, R.M., Aronow, W.S., et al.: 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults. Hypertension 71(6), e13–e115 (2018). https://doi.org/10.1161/HYP.0000000000000065
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Acknowledgements
We would like to thank Manisha Manaswini, Felix Musmann, Juan Carlos Niño Rodriguez, and Carolin Müller for their help during data collection and, also Harry Freitas da Cruz and Attila Wohlbrandt for giving many valuable insights.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
A Data and Code Availability
The code for the experiments is available at: https://github.com/arianesasso/aime-2020. Information on the HYPE dataset is also provided there. The EVAL dataset can be found at: https://www.kaggle.com/mkachuee/noninvasivebp.
B Feature extraction
The features that were extracted from the PPG cycles are described in Table 4 and in Fig. 2 following the work of Kurylyak et al. [12].
C Experiments
More information and details on the methods and experiments can be found at: https://figshare.com/projects/AIME_2020/85166.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Morassi Sasso, A. et al. (2020). HYPE: Predicting Blood Pressure from Photoplethysmograms in a Hypertensive Population. In: Michalowski, M., Moskovitch, R. (eds) Artificial Intelligence in Medicine. AIME 2020. Lecture Notes in Computer Science(), vol 12299. Springer, Cham. https://doi.org/10.1007/978-3-030-59137-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-59137-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59136-6
Online ISBN: 978-3-030-59137-3
eBook Packages: Computer ScienceComputer Science (R0)