Keywords

1 Introduction

Primary hypertension is an intermediate risk factor for cardiovascular disease (CVD) and affects more than thirty million people in Brazil [1]. The treatment for this incurable, chronic disease consists in lifestyle changes, use of antihypertensive medications as well as daily blood pressure monitoring [2]. Hypertension is also one of the major comorbities and an important risk factor for COVID-19 mortality [3]. Common methods used for the measurement of arterial blood pressure (BP) are the auscultatory and oscillometric, which estimate maximum (systolic) and minimum (diastolic) BP pressure [2]. However, these methods have some disadvantages. To perform this procedure, artery must be occluded to interrupt the blood flow in the measurement region, causing discomfort to the patient [4, 5]. It also precludes online and continuous monitoring, which, when needed, often use more invasive techniques [6].

Due to the limitations of the traditional blood pressure measurement, several non-invasive methods are being researched to provide greater comfort and safety for continuous pressure monitoring [6]. Among the biological signals with potential for continuous and non-invasive pressure monitoring applications, photoplethysmography has shown great promise since PPG signals are non-invasive and require simple optical sensors (see Fig. 1).

Fig. 1
figure 1

Photoplethysmography (PPG) technique

Given the widespread use of PPG signals in HR (Heart Rate) and \({\text {SpO}}_2\) monitoring [7], and, even glicemic control [8], it is also a very robust and well known technology. Thus, several groups have been studying the application of PPG signals to estimate blood pressure. Kachuee et al. [9] demonstrate a technique called pulse transit time (PTT), which uses PPG and electrocardiography (ECG) signals to estimate average BP within an 7.52 ± 9.54 mmHg mean absolute error (MAE) range. Also, Nabeel et al. [10] shows that one can use two PPG signals acquired from different parts of the body to assess the average BP within an 0.96 ± 6.77 mmHg mean error (ME) range in a method called pulse wave velocity (PWV). Although these methods have achieved interesting results they also add implementation complexity due the use of two sensors and the performance sensitivity to synchronization issues between the inputs. Also, when the method also requires using a signal other than PPG (f.i. ECG is required in the PTT method) this furthers increases the hardware for signal conditioning.

To reduce hardware complexity some authors have proposed algorithms using parameter extracted from a single PPG signal. Methods using features in time [11, 12] and frequency [13] domains, or both [14], have been explored. Among these studies, [11, 13] have results within the standards proposed for BP measurement [15]. However, in similar studies such as [14, 16, 17], found error to be above the requirements to meet the standard. There is room, therefore, to subsequent work that helps clarify this divergence.

This paper aims to reproduce and compare the features extraction methods reported by Kurylyak et al. [11] and Xing and Sun [13] and compare the performance of different machine learning algorithms for BP measurement in order to assist in the investigation of these reported error differences. The method reported by Kurylyak et al. [11] uses time domain features of the PPG signal to train an artificial neural network (ANN) to estimate the BP. In [13], an ANN is trained using frequency domain features of the PPG signal. In addition, in this work, an algorithm is proposed to estimate systolic and diastolic pressures using the principal component analysis (PCA) technique with time and frequency domain features and machine learning classification.

This work is organized as follows: Sect. 2 presents the general methodology. Section 3 describes the experiments and results achieved. Finally, in Sect. 4 a general discussion and the main conclusions are presented.

2 Methodology

Tests were performed to evaluate the impact of the following issues on the BP measurement error: signal feature selection and dimensional reduction of those features; signal range used; and, finally, choice of regression algorithms.

2.1 Database

Data was extracted from the database presented in [9], which is a preprocessed and validated version of the MIMIC II [18], consisting of 3363 signal segments of at least 10 min, sampled at 125 Hz with a minimum resolution of 8 bits. PPG signals were captured at the fingertip, while ECG lead II and an invasive blood pressure signal were recorded simultaneously.

A few signals of this database contain movement artifacts. By visual inspection, these sections were identified and the entire section was discarded from the beginning of the movement artifact to the end of the signal. A total of 147 blocks were discarded in this step, which corresponds to 658 minutes of signal.

Finally, the detection of peaks and valleys was made using the algorithm presented in [19], which separates the PPG signal in segments and classifies each one as positive slope or negative slope. When a transition from a positive slope to a negative slope segment occurs, the maximum value in the segment with the latest positive slope is considered to be the peak value in the signal. Similarly, when a transition from a negative to a positive slope occurs, the minimum value in the segment with the latest negative slope is considered to be the valley value of the signal. An 8 points (64 ms window) constant segmentation was used. This algorithm was chosen due to the simplicity of being implemented in embedded devices. The detection of peaks and valleys is important both for extracting the features in the time domain and for determining the cycle used for the features in the frequency domain.

2.2 Extracted Features

Kurylyak et al. [11] used 21 temporal features of PPG signal to train ANNs (see Fig. 2). The high number of features were used because of the variability of the data between different individuals to the same pressure value due to physiological characteristics of each individual.

Fig. 2
figure 2

Extracted features of PPG signal

Xing and Sun [13] used normalized PPG signals by applying a simplified model of volumetric expansion of arteries in order to reduce the influence of the variability of the elastic properties of the arteries between different individuals. After normalization, intervals containing a complete cardiac cycle, 10 % of the previous and 5 % of the following cycle are taken and zeros are added until 150 points are obtained. After that, the fast Fourier transform (FFT) of these intervals were performed and the module/phase information up to 10.8 Hz was used to train an ANN with a hidden layer of 35 neurons and two outputs, one for the SBP and the other for the DBP.

In this work, two configurations to extract features based in [11, 13] were used. The first, extracts the 21 time domain features described in [11] while the second applies a normalization and extracts the frequency domain features described in [13]. A normalization different from the described in [13] was used, given by the Eq. 1, which is also very effective according to [20].

$$\begin{aligned} PPG_{norm}=\frac{PPG_0-\text {min}(PPG_0)}{\text {max}(PPG_0)-\text {min}(PPG_0)} \end{aligned}$$
(1)

2.3 Dimensionality Reduction

To reduce the dimensionality of the extracted features, Pearson correlation coefficient, Relief [21] algorithm and principal component analysis (PCA) are used.

Pearson correlation coefficient and Relief weights were calculated in correlation to each feature and the SBP value. The dimensionality reduction by these two methods was performed considering only the eight characteristics with the highest weights when using Relief or with the highest coefficients when using Pearson.

When using PCA, the components that account for at least 95 % of the variability of the input data were considered.

2.4 Blood Pressure Ranges

Some tests were done training the algorithms using data in the entire pressure range available in the dataset. In the other cases we followed the methodology described in [12], in which the data was divided into three categories according to the SBP values - hypotensive (80–100 mmHg), normotensive (100 to 140 mmHg) and hypertensive (140–180 mmHg). The algorithms were then trained in each of these ranges.

2.5 Regression Algorithms

To estimate systolic and diastolic blood pressures from the features of the PPG signal, four regression algorithms were used: multiple linear regression (MLR), ANNs, support vector machine regression (SVR) and decision tree regression (DTR).

The coefficients for MLR were obtained by the least squares method and the neural network used was a multilayer perceptron (MLP) with a hidden layer of 35 neurons and two neurons in the output layer (one for SBP and one for DBP). For SVR and DTR algotihms, linear kernel and binary trees were used, respectively.

3 Experiment and Outcomes

The development of the experiment took place through a set of evaluations carried out in the methods described in section II. A total of twenty-four tests were performed, varying different parameters of the method proposed in [11]. Fifteen tests were performed varying the parameters of the method described in [13] and sixteen tests were performed following the method proposed in this article, which combined the time and frequency features of PPG signal The evaluations carried out on these three cases are illustrated in Fig. 3 by the test flow diagrams.

Fig. 3
figure 3

Test flow diagrams: a method 1 corresponding to proposed method in [11]; b method 2 corresponding to proposed method in [13] and c method 3 corresponding to the proposed method in this work

Tables 1 and 2 demonstrate the test results for the methods proposed in [11, 13]. The tests were performed according to the test flow diagram described in Fig. 3. In these experiments, tests were performed with different features, different regression algorithms and regressors for different pressure ranges.

All results obtained employ the MAE and standard deviation (SD) of the pressure errors as a performance metric. As already mentioned, for MAE and SD to meet the requirements proposed in [15], they must remain below 5 ± 8 mmHg, respectively.

The best results obtained for the method proposed in [11] for systolic pressure were with the use of ANN, in which a MAE of 15.58  ± 14.78 mmHg was obtained for SBP and 7.34 ± 7.05 mmHg for DBP. For the method proposed in [13], the best results were 21.97  ± 16.79 mmHg for SBP and 6.97 ± 5.73 for DBP using the SVR regression algorithm, as seen in Table 2. Note that the obtained results for both cases are far from the standards described in [15].

Table 1 Results for the method proposed in [11] according to the pressure range
Table 2 Results for the method proposed in [13] according to the pressure range
Table 3 Results for the method proposed in [11] according to the number of features
Table 4 Results for the method proposed in [13] using 12 principal components
Table 5 Results for the proposed method using both time and frequency domains features according to the pressure range
Table 6 Comparison of the results of this paper with other works

The method proposed in [11] produces the best result for a general test when compared to other regression algorithms. However, the regression algorithms for fractional pressure ranges shown in Table 1 presented superior performance, even in tests with a reduced number of features using the Relief algorithm and also using the principal components given by PCA dimensionality reduction shown in Table 3. Applying the method proposed in [13], it can be observed that for a test in the general pressure range the ANN regressor is not better than the SVR regression algorithm. In addition, the input feature reduction using PCA to 12 principal components can improve the performance of this method as shown in the Table 1. However, the most promising approach was the one in which the general pressure range was divided into three sub-ranges.

In order to improve the results obtained by the regression algorithms presented in [11, 13], a new approach was proposed using the features of both methods with a dimensionality reduction by PCA. In this approach, the parameters were reduced from 47 to 13 principal components. With this approach, a better result was obtained when compared to the tests of the other two methods shown in the Tables 1, 2, 3, and 4, as can be seen in Table 5.

4 Conclusions

In this study, we developed an experiment to reproduce the methods presented in [11, 13] and, based on those we propose a new method. A summary of the results obtained in previous works and the results obtained in this study is given in Table 6.

From the results obtained, it was observed that, in general, the proposed method, using the temporal and spectral characteristics of the PPG signal, improves the blood pressure estimation in comparison to the other two methods. In addition, better results were observed in tests in which the algorithms were trained only in certain pressure ranges in comparison to when they were trained in the entire available range. This difference is more evident in the estimation of SBP.

Regarding the reproduced methods, it should be noted that the results obtained in this study agreed with the results obtained in [16, 17]. As for the regression algorithms, in general the DTR presented the highest MAE values. The lowest MAE values were obtained using SVR with linear kernel, which indicates a predominantly linear relationship between characteristics and blood pressure values.

Most results are outside the range determined by AAMI, especially those referring to SBP. Despite this, some obtained results for DBP and SBP are within the range of \(5\pm 8\) mmHg, which demonstrates the feasibility of extracting blood pressure only with the use of PPG signs.

The results indicated that the proposed method could improve the blood pressure estimation by evaluating the temporal and spectral features extracted from the acquired PPG signal. The exploration of additional characteristics of PPG signals and their normalization in order to eliminate differences in these signals due to characteristics such as variability in the elasticity of arteries among individuals should be explored in future works, with the potential to obtain better results.