1 Introduction

The measurement of anesthetic depth is an important topic since the invention of modern anesthetic techniques. To distinguish consciousness from unconsciousness states, many clinical observations and electrophysiological parameters have been studied. Clinical changes of heart rate, blood pressure, respiratory rate, lacrimation and sweating have been the basic observations for anesthetic status in daily practice, but these signs are inconsistent and nonspecific. A group of physiologic parameters highly correlated with anesthetic depth were derived from EEG and auditory evoked potential [4, 11]. The BIS index is derived from adjusted weight on the power spectrum, the burst suppression pattern and the bispectrum of the EEG data [19]. The AEP index is derived from auditory evoked potential, which is highly sensitive to anesthetic states [13]. Both indices are normalized to a score from 0 to 100 and have been widely used on the investigation of the depth of anesthesia [20]. The entropy, which measures the randomness of a time series, has also been investigated [14]. However, the signal-to-noise ratio of EEG is low and the monitoring based on it needs specific machines and expensive electrodes. Moreover, there have been reports that BIS value has some overlapping between awake and deep sedative status [8].

Heart rate variability (HRV) also changed significantly from awake to unconscious state [6, 10]. HRV is related to autonomic regulation [1], which is highly affected by general anesthesia. Thus theoretically, HRV is a potential indicator of anesthetic depth. However, the currently used parameters of HRV need relatively longer observed data to calculate. The time domain parameters are used mainly for the long-term measurements [22]. The spectral analysis was used as a “short-term” measurement of HRV, but it still needs data of 5-min length to get reliable results, and its calculation is relatively complex. Moreover, there are large inter-individual variations of conventional HRV parameters, especially the spectral powers [9]. Hence, it is difficult to derive an ideal parameter of HRV with a BIS-like score of 0–100 from these HRV observations.

However, it is worthwhile to investigate more HRV-based anesthetic depth indicator because ECG monitoring is easy and cheap. In this study, we derived an HRV parameter called similarity index (SI) and observed its value at both conscious and anesthetized state during isoflurane anesthesia. This time domain parameter is derived by measuring the similarity of the statistical distribution of R–R interval measurements in consecutive data segments. The data length of ECG needed to calculate this parameter is as short as 1 min. The parameter was found to have high predictive power between consciousness and anesthetic states. Some other HRV parameters were also calculated for comparison.

2 Materials and methods

2.1 The similarity index of HRV

2.1.1 Computation of the similarity index

The derivation of the similarity index from ECG recording is described below and also illustrated in Fig. 1.

Fig. 1
figure 1

The outline of the derivation of the similarity index from R–R interval data

Step 1

The instantaneous heart rates, which are the reciprocals of the R–R intervals, are derived from the ECG recordings. This is resampled by the commonly used algorithm of Berger to 4 Hz [3].

Step 2

Let the instantaneous heart rates be r(1), r(2), r(3),… The differences between consecutive heartbeats were calculated:

$$ d(n) = r(n) - r(n - 1) $$

Step 3

At any observation time t, we compare the statistical distribution of the two consecutive blocks of the d(n) data, one block from d(t−2M + 1) to d(tM) and the other from d(tM + 1) to d(t), both have M data values. We set M = 128. The histograms (i.e., statistical distribution) of d(n) in both data blocks were generated. The determination of the cell number in the histogram is described later. The probability of the d(n) values falling in the ith cell of the histogram was denoted as P 1(i) in the first data block and denoted as P 2(i) in the second data block. For example, between the range of −0.4 and 0.4 min−1 if we choose the cell width in the histogram as 0.002 min−1, then in the first data block, P 1(1) = Probability(−0.4 ≤ d(n) < −0.398), and P 1(2) = Probability(−0.398 ≤ d(n) < −0.396), and so on.

Step 4

The similarity index was calculated by multiplying the probability of d(n) of the corresponding cells in the histograms of the two data blocks, followed by summing the product value in all the cells:

$$ {\text{SI}} = {\left[ {{\sum\limits_{i = 1}^h {[P_{1} (i) \times P_{2} (i)]} }} \right]} \times 100 $$

where SI is the similarity index at the observation time and h is the number of cells in the histograms. The summation results in a value between 0 and 1 and is multiplied by 100 to get a score from 0 to 100.

2.1.2 Selection of the parameters in the computation

The number and width of the cells in the histograms in the step 3 were selected empirically. The values of d(n) basically fall between −0.4 and 0.4 min−1 so we took this as the range of values in our histogram. We tested the cell number in the histogram from 100, 200, etc., till 1,000 and selected the number with the best performance, which was 400 cells. To divide the data range of −0.4 to 0.4 min−1 into 400 cells, the width of each cell was 0.002 min−1.

2.1.3 Meaning of the similarity index

Mathematically the SI is a measure of the similarity between the statistical distribution (or histogram) between two consecutive data blocks. The rationale of this measurement to be an HRV parameter is that, if the heart rate is more variable, then the consecutive data segments will be less similar. If the patterns of the heart rates are less variable, which means the distribution of the two data blocks are similar, then the two histograms will “match” better, that is, at many values of i, P 1(i) and P 2(i) are simultaneously high. Because we compute the SI by multiplying P 1(i) and P 2(i) for all values of i, the resulted SI will be higher under this condition. Thus conceptually a higher SI can reflect lower HRV and vise versa.

2.1.4 Data requirement of the similarity index

The sampling frequency of d(n) was 4 Hz, so data block of 128 points was equal to 32 s of data. In other word, by observing the data of the past 64 s (two 128 point blocks) we can get the similarity index at the observation time. The above calculation can be performed on a beat-to-beat basis.

2.1.5 Error properties of the similarity index measurement

The main sources of error are ectopic or artifact peaks. A spurious R peak will separate a normal R–R interval into two shorter intervals. This results in erroneous count in some histogram cells. For data length of 128, the resulted error in the probability P 1(i) for each related cell will be 1/128. In the worst case when P 2(i) = 1.0, the total error of SI will be 100/128, or about 0.8. For an average value of similarity index at about 20, this represents an error of less than 5%. And in the real cases it is very unlikely that all d(n) fall into a small number of cells, so the error will be much less than this level. Thus the data length we used was enough to avoid significant error if the ECG signal is reasonably good.

2.2 Patients and data acquisition

After institutional ethical approval and getting informed consent, we recorded the electrocardiogram of 30 ASA class I, II patients proposed to receive general anesthesia for various surgical operations. None of the patients have history or clinical manifestations of cardiovascular or neurological diseases. The electrocardiogram recording started when the patient arrived the operation room and was awake. After at least 5 min of recording, the induction of anesthesia was performed as routine practice (intravenous thiopental 5 mg/kg, succinylcholine 1.5 mg/kg and 100 μg of fentanyl). After tracheal intubation, 1.5% isoflurane in oxygen was administered to maintain anesthesia. Mechanical ventilation was applied at respiratory rate 10 min−1 and the tidal volume was set to keep the end tidal CO2 at 30–35 mmHg. The electrocardiogram recording continued for at least 20 min thereafter.

The electrocardiogram was recorded into the hard disk in a personal computer via an analog-to-digital converter with sampling frequency 500 Hz and offline analysis was performed.

2.3 Data analysis

2.3.1 Similarity index

The similarity index value was calculated both in the awake and anesthetic state. The data of the awake state were taken from the 5-min duration before induction of anesthesia. The data of the anesthetic state were taken from the 5-min duration after isoflurane had been used for 10 min and the patient was fully anesthetized with the end tidal isoflurane concentration reached its stable level.

The values of the similarity index were compared between awake and anesthetic states using both paired and unpaired t test. The former was to test the effect of isoflurane on the subjects and the latter was to test the overlapping of the values between the two states in the populations. P < 0.05 was considered as statistically significant.

Furthermore, the ability of the similarity index to predict the state of awake or anesthesia was analyzed by the prediction probability (P k) [21]. The HRV parameters were used as the predicting variable and the state of awake or anesthesia was the variable to be predicted. The P k value revealed the overlapping of the parameter value between the two states. P k value of 1.0 means perfect prediction and totally no overlap between the two states, and the worst P k value of 0.5 means a prediction no better than random guess and this represents overlapping of the value in the whole range of data distribution. The optimal cutting threshold of SI value between the two states was determined by changing the cutting threshold continuously from 1 to 100 (with increment of 1). The P k for each cutting threshold was computed. The cutting threshold with the maximal P k value was identified as the optimal cutting point.

The prediction probability has parallel power to the area under curve (AUC) of the familiar receiver operating characteristic (ROC) curve, but prediction probability is easier to interpret. However, the ROC curve was also derived for completeness.

2.3.2 Power spectral analysis

For comparison, spectral powers were calculated using fast Fourier transformation on the 1,024 data points in the awake or anesthetic states. The high frequency (HF) range was taken as 0.15–0.4 Hz and the low frequency (LF) range was 0.04–0.15 Hz [9, 22]. There was only one spectrum generated from the whole length of 5-min data. The t tests and P k value calculation were performed on HF power, LF power and the HF/LF ratio.

2.3.3 Sample entropy

For comparison, sample entropy [18] was also computed. SampleEntropy (m, r, N) is the negative logarithm of the conditional probability that two segments in a time series are the same in their first m points and remain the same at the consecutive point. Values within tolerable error r were counted as the same. Here, we set m = 2, r = 0.15 times the standard deviation of the time series, and the data length N in the computation of sample entropy was set to be the same as the similarity index used, that is 64 s of data. The sample entropy of the awake and anesthetic states was computed and compared.

3 Results

The trend of the similarity index of one of the patients is shown in Fig. 2. This is the typical pattern of the similarity index trend in many cases. Anesthesia was inducted at point S in the figure and the data representing the anesthetic state were started from point A, when isoflurane anesthesia had been applied for 10 min. The similarity index was very low in the awake state and higher at the anesthetic state. This reflected the decrease of HRV after isoflurane anesthesia.

Fig. 2
figure 2

The trend of the similarity index of one patient. Anesthesia was inducted at point S and the data representing the anesthetic state were started from point A, when isoflurane anesthesia had been applied for 10 min

The results of statistical tests and P k values of SI and other HRV parameters are summarized in Table 1. In the paired t test, the similarity index showed significant difference between the paired difference value and zero, implying significant effect of isoflurane anesthesia on the subjects. On the other hand, the significant result of unpaired t test reflected the little overlapping of SI value distributions between the two states.

Table 1 The values of similarity index and other parameters of HRV in the awake and the isoflurane anesthetic states

The predictive probability P k of SI to discriminate the awake and isoflurane anesthesia state was 0.91. The high prediction power further confirmed the little overlap of its value between the two states. The optimal cutting value of the similarity index to get the best predictability was 18, i.e., if we take the value above 18 as isoflurane anesthesia and the value below 18 as awake, then the correct rate of the prediction was 91%.

The differences between awake and anesthetic state are also significant for the HF and LF power, and the P k were 0.88 and 0.91, respectively. This result showed that the SI, which needed only 1-min data to calculate, had comparative predictive performance with spectral powers. On the other hand, the predictive power of HF/LF and short-term sample entropy are not very high, with P k values of 0.58 and 0.60, respectively. To illustrate the distribution of these parameters, the values of SI and other HRV parameters of all the subjects in the two states are plotted and are shown in Figs. 3, 4, 5, 6, 7. The ROC curve of SI to predict the anesthetic states is shown in Fig. 8.

Fig. 3
figure 3

The similarity index of all patients in awake and anesthetic states. Only the average values in 5-min duration are shown

Fig. 4
figure 4

The LF spectral powers of HRV of all patients

Fig. 5
figure 5

The HF spectral powers of HRV of all patients

Fig. 6
figure 6

The HF/LF value of all patients

Fig. 7
figure 7

The sample entropy of all patients

Fig. 8
figure 8

ROC curve of the similarity index

4 Discussion

Many clinical observations and physiologic parameters have been used to distinguish consciousness from unconsciousness states during general anesthesia. The routine used clinical observations as heart rate, respiration, blood pressure, etc., actually do not have very high correlation with anesthetic status. The auditory evoked potential and EEG related parameters are more powerful, especially the BIS and AEP index. But as Drummond [5] has stated, only a significant difference between the awake and anesthetic state is not enough for a good anesthetic depth indicator. A good parameter of anesthetic depth must have as little overlapping of its values as possible between conscious and anesthetic states in the general population. Ibrahim et al. [8] found that although the average BIS values during sedation were lower than consciousness, the distribution of the values has too much overlapping between these two stages. This lowers the prediction power of the BIS parameter at anesthetic stage.

Besides the EEG derived measurements, the relation of HRV with physiological system and anesthesia has been extensively studied during the past two decades [1, 9, 22]. HRV has been correlated to the regulation of autonomic nervous system. Because autonomic nervous system function is highly affected by general anesthesia, the spectral powers of HRV were suppressed during inhalational general anesthesia [6, 10]. Thus HRV have been widely accepted as a potentially good predictor of anesthetic depth.

However, for the purpose of anesthetic depth monitoring, the time epoch needed for the measurement of specific parameter should be as short as possible. The most commonly used “short-term” parameter of HRV is the spectral powers. Conventionally this is calculated by fast Fourier transformation (FFT) on ECG data of 5-min duration. Theoretically, due to the trade off between time and frequency resolution, the power spectrum generated from too short data block is not reliable. To get power spectrum of short data blocks, more complicated signal processing techniques such as short time Fourier transformation or time–frequency spectral analysis should be used. Thus spectral analysis is not very suitable for the anesthetic depth monitoring. On the other hand, the time domain measurements of HRV are mainly relied on the statistic characteristics of heart rate data and needs even longer samples to calculate. Hence these time domain parameters are conventionally used as long-term measurement of HRV and are applied on the 24-h recordings of ECG data. To cope with this condition, our calculation algorithm compared two consecutive data blocks from the same subject. This self-pairing helped to filter out the inter-individual variation and revealed the oscillation that is mainly due to external influences, which was the anesthetic depth in our cases. This approach has resulted in a stable time domain HRV parameter by using only 1-min ECG data at each measurement. As shown in Fig. 2, the SI value in either awake or anesthetic stage changes smoothly and is relatively consistent in the same stage for the specific individual.

There are two additional advantages of our parameters. One is the better feasibility of ECG than EEG. The energy levels of ECG signals are much higher than EEG and the former is more resistant to noise. The ECG signals can be easily acquired using routine monitoring machine and regular electrode leads. This makes the handling of ECG derived data more convenient, economic and reliable than EEG or AEP related data. The other advantage of our parameter is its simplicity. From the R–R interval time series, only elementary arithmetic calculation and simple counting are needed. This can be done within milliseconds with simple software or hardware. Thus the parameter can be derived on a beat-to-beat basis and have the potential for real time applications without expensive instruments.

We also tested the predictive power of ECG derived sample entropy on the anesthetic states because the computation of this parameter needs only short data segments. However, as our data has shown, this may not be an ideal indicator of anesthetic states under isoflurane general anesthesia. The EEG derived entropy had been used to investigate the consciousness and anesthetic depth of the subjects [12]. But the underlying physiological mechanism of the EEG and ECG derived entropy may be very different. ECG derived entropy has been applied in the study of various other clinical conditions [2, 7, 17]. However, there has been little evidence on the correlation between ECG derived entropy and the depth of anesthesia. Our result showed that more effort is needed to make ECG derived entropy a valuable predictor of anesthetic states.

Mathematically the SI is a measure of the similarity between the statistical distribution (or histogram) between two consecutive data blocks. The rationale of this measurement to be an HRV parameter is that, if the heart rate is more variable, then the consecutive data segments will be less similar. This “similarity between consecutive data” actually was implicitly implied in many approaches of HRV measurements. For example, the sample entropy defined the “similarity” as “the probability of the repeating of data sequences”, the conventional time domain HRV measurements viewed the “similarity” as the measurement of the statistical variation of the data, and the spectral analysis defined the concept of “similarity” implicitly as the periodicity of the time series. Our algorithm suggested just another approach on this concept. All these are merely different viewpoints on the same data and we can’t say which one was superior to another. For the monitoring of anesthetic depth, as the underlying physiological mechanism of general anesthesia hasn’t been elucidated yet, it is impossible to determine the performance of these various algorithms in theoretical level. Thus the validity of any of these predictors can only be tested by measuring real world data. We think that maybe the significance of such kind of parameter is based on its usefulness in the real world application more than on its physical meaning.

Besides the SI, there are also many other time domain HRV parameters. Although these are all derived from the R–R interval time series, they were based on different computation processes so the inter-relation of them cannot easily be clarified. For example, sample entropy measures the probability of the repeating of data sequences. How is this related to the standard deviation of mean of the R–R intervals? Actually there may be very little relation between these parameters. We believe that this is also the case between the SI and the other time domain parameters.

In a study of heart failure patients, an autocorrelation like function had been used on the R–R intervals to treat the HRV data [23]. However, in the current study, we chose to perform our similarity index calculation on the difference of heart rate instead of the heart rate per se. Because HRV mainly represents the regulation rather than the activity of autonomic function [15], the difference of heart rates have more significant information than the heart rate itself. Myers et al. [16] had compared several time domain HRV parameters and concluded that the parameter of BB50, which was derived from the R–R interval difference, had the highest correlation with the prognosis of AMI patients. Actually, we had tested the calculation of our similarity index by using only the heart rates and the resulted index were far less impressive with lower predictive power.

The SI has improved the ability of HRV in the application of anesthetic depth. However, a perfect predictor of 100% power without any overlapping between two groups is unlikely to be exist because in any population there will be some extreme cases. In other words, for any biomedical measurement or diagnostic test, some false positive or false negative is unavoidable. This is why the concept of “sensitivity” and “specificity” were developed and the tools such as the ROC curve or the prediction probability P k were used to evaluate the prediction power or accuracy of any predictors. A prediction probability of 91% is actually quite good as in our work. Notice that in Fig. 3 if we exclude a couple of extreme cases (i.e., the two with the highest and lowest awake SI values), then the trends of the SI values in the remaining subjects are surprisingly consistent, the separation of awake and anesthetic value is very clear cut. However, it is actually too optimistic to suppose any HRV parameter can solely be used as a satisfactory anesthetic depth predictor. HRV is only an indirect indicator of anesthesia and is relatively nonspecific. There are many clinical factors that may affect the autonomic system and change HRV, decreasing the predictive power of related parameters. The effects of various factors on our similarity index need further investigations to elucidate. However, before the true mechanism of general anesthesia can be clearly explored and direct observation can be performed, we can only rely on the indirect measurements to monitor the anesthetic depth. Because there still is no single perfect predictor, there have been trials to combine several modes of observations to increase the predictability of anesthetic depth. Unfortunately, due to the lack of ideal parameters, HRV has rarely been incorporated into those systems up to now. Our current study provides a possibility to improve this condition. We believe that HRV can play more important role in the application of anesthetic depth measurements.

5 Conclusion

The similarity index of HRV showed significant difference and little overlap between awake and isoflurane anesthetic states, and is easily derived from short ECG data. Our research strongly suggests that HRV, if suitably treated, can play more roles in the monitoring of anesthetic depth than it currently does.