1 Introduction

Verification of individuals using their physiological data or behavioral characteristics, using biometric systems, has attracted much attention due to their high discriminative contents (Jain et al. 2004). Biometric systems have been widely used in secure and financial environments such as military, banks, airports, security agencies, and famous companies. Most biometric systems usually employ one modality (e.g., finger print) to verify its queries, though there are some expensive biometric systems that have some data acquisition modalities.

Although some biometric features are very reliable like DNA and electrocardiogram (ECG) features, their data acquisition hardware is expensive, sensitive, and time consuming during the data recording. In contrast, there are some cheaper biometric systems that are versatile due to their fast and cheap data acquisition module like finger print, palm print, voice, face, ear shape, and handwriting.

ECG is a time-varying physiological signal that is generated by electrical activity of the heart muscles. As much as 97% of heart diseases can be observed by specialists’ visual inspection (Kligfield 2002). This grammar-based cyclostationary signal contains the key information which has been used in biometric systems (Boumbarov et al. 2008; Ghofrani and Boostani 2010). Although ECG has a quasi-rhythmic behavior, due to the existence of heart rate variability (HRV), ECG is categorized as a stochastic signal. Since the heart system of everyone is unique in terms of heart muscle, atrium, and ventricular cavities, the ECG signal provides subject-dependent features (Fatemi and Sameni 2017).

In the last decade, various studies have been conducted to employ different ECG features as a robust biometric signature of human. Biel et al. (2001) collected ECG signals of 20 normal subjects and extracted 12 features from each record to verify the subjects using a biometric system. Shen et al. (2002) elicited seven features from the detected QRS complexes of 20 subjects and applied them to a neural network for the classification; however, the recognition results were not convincing for those who expect a very robust and accurate verification system. The accuracy of morphological-based methods is highly dependent to the QRS detection algorithm.

In addition to the time-domain features, several studies employed various mathematical transforms to map the ECG signals into spectral domain, time–frequency domain, or state space to manifest the differences between the ECG of different subjects (Hoekema et al. 2001). Wang et al. (2008) extracted discrete cosine transform (DCT) coefficients from the auto-correlation function of successive heartbeats to verify their queries. In another attempt, wavelet coefficients of ECG signals were extracted and their size was diminished using principal component analysis (PCA). Finally, these features were fed to a classifier for differentiating the subjects (Wang et al. 2006). Since ECG is a grammar-based signal, applying wavelet transform to the ECGs and determining the energy of decomposed signals (in different scales) will cause the loss of important morphological features. Nevertheless, each of the discussed methods has its own bottleneck. For instance, the optimum number of decomposition scales by the wavelet transform is unknown and should be estimated through the cross validation (Cheang Loong et al. 2010). In addition, estimating a threshold value to select the DCT coefficients can highly affect the performance.

From another perspective, some research teams focused on extracting the morphological features from ECG signals. Wang et al. (2008), extracted 21 key points from each ECG cycle and determined the covariance of these points through successive cycles. By applying the covariance features to a classifier, they achieved 84.61% identification rate.

Fratini et al. (2015) provided a comprehensive survey over different ECG signal-processing techniques for the human identification. The assessed methods were applied to different data sets, whose number of participants varied from 10 to 502, and age ranged from 16 to 86, acquired from both genders. They evaluated different methods under a weighted average of the identification rate and equal error rate criteria. The best achieved results in terms of identification rate and equal error rate (ERR) resulted in 94.95 and 0.92%, respectively.

Abbaspour et al. (2015) utilized three data sets for the human identification. They first removed the motion artefact and baseline wandering from the ECG signals, and then elicited DCT coefficients from the ECG cycles. To select the most discriminant features, genetic algorithm was applied and the selected features were applied to a multi-layer perceptron (MLP) neural network. They obtained identification rates of 99.89, 99.84, and 99.99% for three public ECG databases, MIT-BIH Arrhythmias, MIT-BIH Normal Sinus Rhythm, and the European ST-T, respectively. In another attempt, Rahbi and Lachiri (2013) extracted some morphological descriptors such as the amplitude, surface, effective interval, and slope from each ECG cycle of 18 subjects. Later, 60 Hermite polynomials expansion (HPE) coefficients were extracted from the normal ECG signals of MIT-BIH data set. They considered the ECG features of the first 20 min from all subjects as the train set and the remaining 10 min of signals from the same subjects as the test set. Their results provided 99% recognition rate, which is a remarkable result, but extracting HPE coefficients is a bit time consuming.

Ting and Salleh (2010) proposed a new approach for ECG-based personal identification using the extended Kalman filter. By estimating the dynamic of ECG signal, the estimated state vectors were used as the ECG features. The extracted features capture the temporal information and amplitude distances between the PQRST points, which are discriminative enough to reveal the unique nonlinear characteristics of each subjects’ ECG. The results demonstrated 87.5% identification rate over 13 subjects, randomly selected from the MIT-BIH arrhythmia database. Their results in the presence of the additive white noise showed that this model is robust to noise for signal-to-noise ratio (SNR) level above 20 dB.

The proposed scheme, in this paper, can be described by first decomposing an ECG signal into its essential intrinsic mode functions (IMFs) by applying empirical mode decomposition (EMD) (Huang et al. 1998; Souza Neto et al. 2002) algorithm. Then, the slowest decomposed subspace was selected, because it includes the slow key components of ECG such as P, T, and slow part of QRS waves. Next, Hilbert transform (Huang et al. 2003) was applied to the selected intrinsic mode component to determine instantaneous frequency, instantaneous phase, amplitude, and entropy of the analytical signal. These features were considered as a diagnostic set and by applying them to the k-nearest neighbor (kNN) classifier, and each input ECG is assigned to its corresponding subject. To show the effectiveness of the proposed method, several state-of-the-art methods were implemented here and applied to the same subjects. As far as the state-of-the-art methods in the time domain are highly sensitive to additive noises, baseline wandering, amplitude variation, and shape of the ECG components (such as P and T waves), this study aims to propose a statistical-based method to act robust against these variational factors.

The remainder of this paper was structured as follows: the description of the employed data set, along with the explanation of EMD, Hilbert methods, and state-of-the-art methods, are brought in Sect. 2. Section 3 presents the experimental results and discusses the pros and cons of the proposed method compared to the others. Finally, in Sect. 4, the paper was concluded and some suggestions were presented as the future work.

2 Materials and Methods

In this part, first, the deployed database was briefly described. Then, the proposed method along with the state-of-the-art methods was briefly explained, and finally, nearest neighbor classifier was introduced as a decision-making tool to assign input ECGs to their corresponding subjects.

2.1 Data Collection

Here, ECG signals from 34 healthy subjects from Physikalisch-Technische Bundesanstalt (PTB) database were downloaded and used for evaluation of the proposed compared to the state-of-the-art methods (Abbaspour et al. 2015). This data set was offered by the National Metrology Institute of Germany, containing 549 records from 290 subjects, among whom just 52 subjects were healthy and the rest of the patients suffered from different types of heart diseases and their signals contained different kinds of arrhythmia. Therefore, the experiment was executed on the ECG signals of 52 subjects. From these 52 subjects, the ECG signals of 18 subjects were contaminated by different types of noise (e.g., baseline wandering, artefacts, and EMG signals), which were discarded from the set. From the remaining 34 healthy subjects, ECG signals were recorded from 15 channels, 12 of which were located on (i, ii, iii, avr, avl, avf, v1, v2, v3, v4, v5, and v6) along with the 3 Frank lead ECGs (vx, vy, vz). Sampling rate was set to 1000 Hz, acquired from an A/D with a 16-bit resolution in the range of ± 16.384 mV. These signals were recorded at the cardiology department of Benjamin Franklin clinic in Berlin. The selected subjects cover different ages and genders (Oeff et al. 2012).

2.2 Proposed Method

To explain the proposed method, first, we need to describe the EMD and Hilbert methods, and then, we explain how to extract the features by means of these two methods.

2.2.1 Empirical Mode Decomposition (EMD)

EMD is an adaptive method, which decomposes an arbitrary signal to its intrinsic oscillatory sub-signals, termed intrinsic mode function (IMF) (Huang et al. 2003). In the first step, the upper and lower envelopes of the input signal are elicited and the original signal is subtracted from the average of the two envelope signals. This process is repeated until the difference between the baseline cross points and the local optima does not exceed one.

2.2.2 Hilbert Transform

Hilbert transform of a signal, returns a complex helical sequence, called the analytic form of the input signal, which is a complex signal containing a real part xr (input signal) and an imaginary part, xi, which is constructed by the Hilbert transform. The imaginary part can be considered as a version of the real part which has a 90° phase shift. Therefore, the Hilbert transformed series has the same amplitude and frequency content as the original input signal including the phase information that depends on the phase of the original data. After reconstructing the analytic form of the input signal, the instantaneous amplitude is determined as the amplitude of the analytical signal and the instantaneous frequency (IF) is calculated as the change rate of the instantaneous phase angle (Boashash 1992). The Hilbert transform h(t) of the input signal f(t), can be directly determined by the following equation:

$$ h(t) = 1/\pi \times p\int {f(t^{\prime})/f(t - t^{\prime}){\text{d}}t^{\prime}} , $$
(1)

where P indicates the Cauchy principal value (Wang et al. 2006, 2008).

2.2.3 Eliciting the Proposed Features

The proposed scheme is structured based on three stages, as depicted in Fig. 1. First, the data were segmented through the successive intervals of 1 s (1000 samples), with no overlap. Then, windowed signals were decomposed by EMD into its IMFs (decomposed signals). Next, the lowest frequency IMF was selected and its analytical signal was constructed using the Hilbert transform. Afterward, the instantaneous frequency (IF), instantaneous phase (IP), amplitude, and entropy values from this IMF were considered as the proposed features in this study. These four features were extracted from each ECG lead, and therefore, the number of extracted features from 12-lead ECG is 48. These features were then classified using the k-nearest neighbor (kNN) classifier to assign a label to each subject.

Fig. 1
figure 1

Stages of eliciting the proposed feature

2.3 Explanation of the Compared State-of-the-Art Methods

2.3.1 Wavelet-Based Features

Wavelet transform (Daubechies 1992; Burrus 1998) is an efficient tool for signal representation jointly in time and frequency domains. Similar to the Fourier series, wavelet is a decomposition method that models an input signal as a weight function of scale function and wavelet functions by this difference that Fourier series is a summation of tone frequencies, while scale and wavelet function contains a bandwidth. Moreover, wavelet transform can be applied to nonstationary signals and allows us to tune the resolution between time and frequency domains. In addition, the decomposed signals are orthogonal to each other (except for “bior” mother wavelet) which produce independent features for the classification task. Wavelet coefficients c(j,k) belonging to each wavelet function \( \varPsi_{j,k} (n) \) are determined by the following relation:

$$ c(j,k) = \sum\limits_{n \in z} {f(n)\varPsi (2^{j} (n - k)),} $$
(2)

where f(n) is the input discrete signal and \( \varPsi_{j,k} (n) \) is the discrete wavelet function in the jth scale with shift k. Changing the scale and translation parameters of a mother wavelet can model the low- and high-frequency parts of the original signal. To elicit the wavelet-based features, first, the ECG signals were filtered (decomposed) into dyadic band-pass filters, where the function of filters was the wavelet functions in different scales. Next, from each filtered signal (at each scale), within each windowed signal, five features were extracted including: the energy of signal, maximum, minimum, mean, and standard deviation. Therefore, five features from five scales were extracted and the ECG signals were collected by 12 leads. Since ECGs were decomposed into five levels, the number of extracted features becomes 5 × 5×12 = 300, and when the number of decomposition increased up to 10, the extracted number of features becomes 10 × 5×12 = 600 features for each windowed signal. Given that the additive noise of ECG signals is a colored noise, it can be claimed that this noise does not invade all of the decomposed signals; therefore, the wavelet-based features are fairly robust to noise.

2.3.2 Principle Component Analysis (PCA)

Principal component analysis (PCA) is an orthogonal linear transform which projects a high-dimensional input to a lower dimensional subset of features, where the new features are orthogonal and independent of each other. In fact, PCA transformation is determined based on the covariance matrix of data, and therefore, Eigen values and their corresponding Eigen vectors are estimated (Castells et al. 2007). Given that a threshold determines the informative Eigen vectors in which data is scattered along those directions. Therefore, just those features are selected which have a high variance (entropy) (Kouchaki et al. 2012). As we mentioned before, the employed ECG signals were segmented into 1 s windows, constructing a 1000 × 12 matrix. Next, the covariance matrix of ECG signals is estimated for all channels. By applying PCA to each window, 12 × 12 features were extracted; therefore, the number of features became 144. In the next stage, these feature vectors were applied to the k-nearest neighbor (kNN) classifier.

2.3.3 Correlation-Based Method

Due to the existence of heart rate variability (HRV) for normal subjects, the characteristics of ECG vary from one beat to another beat (Taouli and Bereksi-Reguig 2013; Tadejko and Waldemar Rakowski 2007). In this paper, to digest the ECG variation, Pan–Tompkins algorithm (Pan and Tompkins 1985) was used to finely extract the QRS complex within each cycle. From each cycle, a part of ECG with the length of 1 s centered at R point was selected. The time lag \( \tau_{t} \) was determined by the Pan–Tompkins scheme, updated at each trial to find the maximum correlation of ECG signal between two successive trials (Sabeti and Boostani 2017). The correlation between two different ECG leads is determined by

$$ R_{x} (\tau_{t} ) = E\{ x(t + \tau_{t} )x^{\text{T}} (t)\} , $$
(3)

where the correlation of ECG samples for all channels at the time instant \( t \) was measured to the samples of ECG signal at \( t + \tau_{t} \) in the next cycle. Since the ECG signal is an ergodic process, instead of finding the correlation of points at a certain shift, over different trials, the same results are achieved when this correlation is determined through the time. The correlation value is determined among two-by-two ECG leads, and therefore, 12 × 12 = 144 features were obtained. In fact, these features contain spatio-temporal information of the 12 leads, since it determines both through the time and over different locations on the chest. Figure 2 depicts stages of the implemented correlation-based method.

Fig. 2
figure 2

Correlation-based method

2.3.4 Fiducial-Based Method

As we explained in Fig. 1, a normal ECG cycle consists of a P wave, a QRS complex and a T wave, where the backbone of each part was explained. In this manner, first, we detected the R point using Pan–Tompkins algorithm. Second, for each cycle, finding local minima and maxima, separately, localized the positions of Q, S, P, and T. Finally, the covariance matrix of these fiducial points (Yazdani and Vesin 2016; Kang et al. 2015; Khelil et al. 2007) was determined through different cycles and this matrix was considered as the features, carrying the subject-dependent information. Therefore, a covariance matrix with the size of 5 × 5 is constructed for each ECG lead, and therefore, for 12 leads, 300 features were generated.

2.3.5 Autoregressive (AR) Coefficients

One of the most powerful tools for signal modeling is the AR model. In this model, each sample is predicted according to the weighted average of its m former samples, where m determines the order of AR model:

$$ x(t) = \sum\limits_{i = 1}^{m} {\hat{a}_{i} x(t - i)} , $$
(4)

where \( \hat{a}_{i} \) denotes the AR model coefficients. In this paper, the Burg method was employed to estimate the AR coefficients based on the summation of both forward and backward prediction error. In addition, the finite sample criterion (Stoica and Moses 1997) was used to select the best order of AR model considering the residual variance and the prediction error. In this study, the best order of AR model was set to 4. Since we analyzed 12-lead ECGs, the number of elicited features became 48.

2.4 k-Nearest Neighbor (kNN) Classifier

kNN (Fix and Hodges 1989) is an old but famous classifier which is still widely used in several applications due to its simplicity, interpretability and good performance. kNN is the only classifier that does not need any training phase and its way of decision making is based on the majority of its k-nearest neighbors. KNN is a local classifier and is suitable for data with multimodal distribution. For each test sample x0, its k-nearest neighbors should be found and the label of samples, with majority vote, is assigned to x0. In this study, we set k to 5, where the best value k is obtained through the cross-validation phase.

3 Experimental Result and Discussion

To evaluate the performance of the proposed method, the experiment was performed on the PTB database (Oeff et al. 2012), which is explained in Sect. 2.1. Experimental results were generated by applying the described methods to the successive 12-leads ECG windows. For the proposed method, AR coefficients (order 4) and PCA, the window length was set to 1 s for stationary assumption. For the correlation-based and fiducial-based features, each windowed signal took 1 s for each cycle, centered at the R point. In contrast, to elicit wavelet feature, the window length could be extended to longer windows (4 s).

To demonstrate the effectiveness of the employed EMD scheme, Fig. 3 exhibits the extracted IMFs from an ECG trial. As we see, the decomposed components of the two randomly selected subjects resulted in different IMFs. By observing the discriminant IMFs for different subjects, it can be claimed that each individual produces a set of unique IMF components that can be considered as suitable candidates for the human verification process. If one carefully looks through the decomposed components, he can observe that the slowest decomposed signal carries more discriminative information in comparison with the other IMFs. Therefore, the last IMF was used (Kouchaki et al. 2012) and the analytical form of this IMF was constructed by the Hilbert transform and the instantaneous frequency (IF), instantaneous phase (IP), amplitude, and entropy of this IMF for each windowed ECG signal was used as the proposed feature.

Fig. 3
figure 3

ECG signal of two randomly selected subjects and their decomposed IMF components. a first person, b second person. Different people have different extracted components

To show the effectiveness of the Hilbert-based features from the slowest component, Fig. 4 illustrates the difference between the analytical ECG signals of two randomly selected subjects. As shown in Fig. 4, the results of Hilbert transform in the complex space, over the last IMF component, are not similar at all. Thus, it is expected that the estimated instantaneous frequency, instantaneous phase, amplitude, and entropy, from the analytical forms of the signal, generate a convincing verification rate.

Fig. 4
figure 4

Hilbert transform of the last EMD component of subjects shown in Fig. 4: a first person b second person. Hilbert transform of different people has different shapes

This discriminability is emerged from the differences between the dominant frequency (IF) of their slowest IMF, containing the P and T waves. In fact, these two waves along with the other low-frequency segments of ECG carry subject-dependent information, as the fiducial points features act based on the similar fact. To show the effectiveness of the proposed feature, instantaneous frequency, instantaneous phase, amplitude, and entropy of the test subjects were applied to the KNN (k = 5) classifier through the ten times tenfold cross validation. The average result over the 100 runs yielded 95% accuracy (on average) for the 34 selected subjects.

In this part, also the achieved results by implemented state-of-art methods are presented. The first rival study (Shen et al. 2002) decomposed the ECG signals using the ‘db2’ mother wavelet into five and ten scales, respectively. Then, the following features from each decomposed signal were separately determined: mean, max, min, standard deviation, and the energy of signal. These attributes were arranged in a feature vector within each windowed signal. In this regard, for each subject, a set of feature vectors was generated and similarly applied to the kNN classifier according to the ten times tenfold cross-validation manner. The average results over 100 runs provided 77 and 98% verification accuracy over the 34 subjects (see Table 1).

Table 1 Results of applying the proposed method along with the compared ones to the PTB database

In the PCA method, each segmented ECG signal (1 s) was arranged in a matrix (1000 × 12), and by arranging the samples of each ECG cycle into a matrix, for 20 min, 1200 matrices were generated as the train set, and then, the covariance of these matrices for each subject was separately calculated. Then, PCA was applied to extract the discriminant features, which have 144 dimensions. The average results of applying these feature vectors to the kNN classifier, through 100 runs (10-times tenfold cross validation) produced 96% accuracy (see Table 1).

Moreover, the results of the correlation-based method, fiducial points, and AR model, through the same validation manner over these 34 subjects provided 92, 97, and 97% verification accuracy, respectively. The verification results of all competitive methods in terms of mean ± standard deviation (Std), the number of extracted features, and time complexity are demonstrated in Table 1. As shown in Table 1, result of the proposed method in terms of mean is not superior to all of the other competitors, though it presented better results than AR model and correlation-based method. In contrast, in terms of number of elicited features, the result of the proposed method is similar to that of AR model and both are significantly superior to the other compared methods. The number of extracted features plays an important role in the computational complexity of the kNN classifier in the recall (test) phase. Computational complexity of the proposed and wavelet methods is \( O(d \cdot n \cdot \log n) \) and \( O(d \cdot n) \), where \( n \) and \( d \) show the length of ECG data and number of ECG lead, respectively (Wang et al. 2014), (Bilato et al. 2014). For correlation and fiducial-based features, the QRS detection algorithm must be run first with computational complexity equal to \( O(n) \) (Fonseca et al. 2014). Therefore, time complexity of correlation and fiducial-based features is estimated as \( O(n + d^{2} ) \) and \( O(n + 25d) \), respectively. Finally, computational complexity of PCA and AR methods is estimated \( O(np^{2} ) \) and \( (nm + 5m^{2} ) \), where p and m show the number of components and order of AR model, respectively (Rujirakul et al. 2014; Roweis 1998; Vos 2013).

To show the significance of the achieved results, one-way ANOVA (t test) was applied to determine whether any statistical difference exists between the proposed method and the other methods or not. The achieved results support the statistical superiority of the wavelet features with ten scales to the other compared methods in terms of t test and F measure (P < 0.05, F = 8.08). Whereas, no significant difference among the proposed method, AR, PCA, correlation-based, and fiducial-based methods were found (P > 0.05).

Although the obtained verification rate of our method is near to that of PCA, the proposed framework does not need any predefined parameter, like the threshold in PCA. EMD automatically decomposes a signal to its IMF sub-signals, and consequently, no predefined parameter is needed. Incidentally, EMD benefits from this fact that it decomposes its input signal into its own components that are intrinsically matched with the structure of signal. In other words, EMD does not decompose the ECG signal using some fixed and predefined functions like wavelet or Chebyshev or Hermitian expansions. In contrast, EMD drives the consistent intrinsic sub-signals, which is an adaptive and signal dependent manner. Nevertheless, there is no strong statistical support or constraint like independency or orthogonality over the decomposed signals.

There are important points that must be taken into account:

  1. 1.

    From another perspective, the implemented methods can be categorized into morphological based and non-morphological methods. The morphological ones need to detect the R point of each cycle prior to further processing. Therefore, the verification accuracy of correlation-based, and fiducial points methods (Akhbari et al. 2013; Plesnik et al. 2011) is highly related to the correct detection of R points, while the proposed method, wavelet-based, PCA-based, and AR-based methods do not suffer from such a problem.

  2. 2.

    Correlation-based coefficients use the similarity between two successive ECG cycles through the time over all channels; therefore, this spatio-temporal information is more understandable for specialists. In addition, AR coefficients are determined from the auto-correlation matrix which uses the multiplication of signals’ amplitude. Although the results of correlation-based features and AR coefficients are comparable to the other methods, the performance of these two methods is very sensitive to additive noises. As far as this feature uses the similarity of amplitude signal, it is a noise sensitive feature.

  3. 3.

    First, we used the ECG signals of all 52 subjects in the PTB database, but the identification rate for 18 subjects was significantly lower than the other subjects. This was because their ECGs were highly noisy and performing an efficient preprocessing phase was not the main concern of this study. Thus, we removed the noisy ECGs from the list of investigated subjects. Nevertheless, one should extend this study by applying the compared methods to a larger set of ECG signals, though the majority of the previous works applied their method to small population.

4 Conclusion and Future Work

In this study, a new approach for ECG-based verification system was proposed in which the raw ECG signals were decomposed by EMD into their intrinsic sub-signals (modes). Then, by applying the Hilbert transform to the last component of EMD, the analytic form of this low-frequency signal (containing P and T waves) was generated. The instantaneous frequency, instantaneous phase, amplitude, and entropy values were extracted from this analytic signal considered as the proposed set of features. In addition, several state-of-the-art methods were implemented over the same subjects. The statistics and energy of the decomposed ECG signals (ten scales) provided the best results in terms of the mean verification rate though it suffers from a high-dimensional feature space and imposing a high computational burden to determine the distance between the points. In contrast, the proposed method and the AR model generated the lowest number of features. The correlation-based method and AR model could not provide a considerable verification rate, since both use the auto-correlation and covariance of the data which are highly sensitive to the additive noise. On the other hand, wavelet features along with the fiducial points and PCA features were successful in terms of providing a high verification rate but suffering from generating high number of features and, therefore, prolong the recall phase which makes it unsuitable for online applications.

As a future work, EMD algorithm can be enhanced by incorporating other time-varying features into the algorithm to produce more independent and orthogonal components which decrease computational burden to the decomposition process. Since many clinical studies report systematic age-related changes in specific ECG characteristics such as heart rate variability, as a future work, a larger number of participants need to be investigated. Another suggestion is to consider the effect of age-related changes over the ECG verification systems.