1 Introduction

Biometric recognition provides an important tool for security by identifying an individual based on physiological or behavioral characteristics [1]. Electrocardiogram (ECG) is a tool for biometrics, which describes the electrical activity of the heart. The electrical activity is related to the impulses that travel through the heart.

An ECG signal of a normal heartbeat consists of a P wave, a QRS complex and a T wave [2,3,4,5,6].

There are a lot of systems for ECG as a biometric. Those ECG-based biometric systems can be categorized into fiducial [7, 8] or non-fiducial [9,10,11,12,13] systems, according to the utilized approach to feature extraction.

Some of the existing proposed systems combine fiducial and non-fiducial features in a hierarchical manner to improve performance [2, 12, 13].

This study will introduce three different approaches in feature extraction. The first is based on a non-fiducial approach, the second is based on a fiducial approach, and the third is based on a fused approach between fiducial and non-fiducial, achieving a high accuracy than each approach alone. The paper is organized as follows: In Sect. 2, we will show the data sets, preprocessing and the feature extraction techniques used. Furthermore, the effectiveness of the proposed methods for classification and different classification algorithms used in the ECG identification is demonstrated. In Sect. 3, experimental and discussion results are shown. Finally, Sect. 4 presents the conclusion.

2 Methodology

A methodology of a biometric system usually mimics that of a pattern recognition system. Thus, it can be divided into four main phases: (1) data acquisition (2) preprocessing, (3) feature extraction and (4) classification (subject identification).

2.1 Data acquisition

Datasets are collected from two databases: first database is ECG-ID [14] database, containing 310 ECG recordings, obtained from 90 persons.

Second database is MIT-BIH Arrhythmia [15] database, containing 47 subjects. Thirty subjects are used in classification. Those 30 subjects are classified according to different heart beats in this database. Six types of ECG heart beats are selected. One is the normal (NORMAL), and five are ECG arrhythmias, such as premature ventricular contraction (PVC), paced beat (PACE), right bundle branch, block beat (RBBB), left bundle branch block beat (LBBB) and atrial premature contraction (APC).

2.2 Preprocessing

Preprocessing is a process of de-nosing, filtering the signal by removing the most common noises that can appear and delivering it in a pure shape to extract features from it. There are three types of noises in the ECG signal: power-line noise, high-frequency noise and baseline drift. As a result of a series of experiments, the following combination of methods has been selected for the preprocessing phase. Firstly, baseline drift correction is done using wavelet decomposition. Donoho and Johnstone proposed the universal ‘VisuShrink’ threshold given by [16]

$$\begin{aligned} \hbox {Thr} = \upsigma \sqrt{2\hbox {log}\left( {N} \right) } \end{aligned}$$
(1)

where N is the number of data points and \(\sigma \) is an estimate of the noise level. The wavelet-based de-noising process is summarized as follows: the resulting discrete wavelet transform (DWT) detail coefficients are thresholded by shrinkage (soft) strategy. Reconstructing the original sequence from the thresholded wavelet detail coefficients leads to removal of baseline drift. Baseline drift correction is done by using wavelet decomposition with wavelet name db8 with \(N = 9\). N is the level number, and a soft threshold \({=}\) 4.29 is used. Secondly, an adaptive band stop filter fairly well suppresses power-line noise with \(W_\mathrm{s}\,=\,\)50 Hz, where the Ws is stop band corner frequency. Then, a low-pass Butterworth filter is used with \(W_\mathrm{p}\) \({=}\) 40 Hz, where Wp is the pass band corner frequency, \(W_\mathrm{s}\) \({=}\) 60 Hz, \(R_\mathrm{p}\) \({=}\) 0.1 dB. Rp is the pass band ripple and \(R_\mathrm{s}\) \({=}\) 30 dB is the attenuation in the stop band to remove the remaining noise components, caused by possible high-frequency distortions. The last step is smoothing the signal with \(N = 5\), where N is the smoothing value to produce the preprocessed signal. It is shown in Fig. 1a the original ECG signal and its de-noised one in Fig. 1b.

Fig. 1
figure 1

a ECG-ID person 1 signal 1 before preprocessing and b ECG-ID person 1 signal 1 after preprocessing

2.3 Feature extraction

Feature extraction is a process to find a transformation that converts the ECG signal into a relatively low-dimensional feature space. In the following subsections, we will discuss the three different types of feature extractions. Firstly, the non-fiducial features that capture the holistic patterns by examining the ECG data in the frequency domain are discussed. Secondly, the fiducial features that capture the holistic patterns by measuring the distances, amplitudes and angles of the ECG data are tackled. Finally, the fusion between fiducial and non- fiducial approach is discussed in details.

2.3.1 Non-fiducial approach based on auto correlation/discrete cosine transform (AC/DCT)

The AC/DCT method is considered a window of the ECG heart beats; this method creates a window of samples from ECG signal with a predefined length. Samples of the window are of length N, which are greater than a complete ECG heart beat to ensure that all of the ECG peaks are included in window. In our case, a window of 10 s can be more appropriate to use in our feature extraction method. This method is applied in order to combine all the samples found in the window into a sum of products. The normalized AC equation used in our feature extraction technique is shown in the following equation:

$$\begin{aligned} {R}_{{xx}}^{\wedge } \left[ {m} \right] =\frac{\mathop \sum \nolimits _{{i=0}}^{{N}-\left| {m} \right| -1} {x}\left[ {i} \right] {x}\left[ {{i}+{m}} \right] }{{R}_{{ xx}}^{\wedge } \left[ 0 \right] } \end{aligned}$$
(2)

x[i] is the window from ECG and x[\(i + m\)] is the time-shifted version of the windowed ECG with the time straggle of \(m = 0,1\ldots (M-1)\) and M are much less than N. M is a parameter that is to be chosen, while \({R}_{{xx}}^{\wedge } \left[ {m} \right] \) is an autocorrelation sequence and \({R}_{{xx}}^{\wedge } \left[ 0 \right] \) is an average power whose must be greater than 0 [17]. DCT is used to reduce the dimensionality of the features produced, keeping the number of the important coefficients as shown in Fig. 2a, b, c, d.

Fig. 2
figure 2

a ECG-ID subject 1 the preprocessed signal, b the normalized autocorrelation sequence, c zoomed in to 400 AC coefficients from the maximum and d DCT of the 400 AC coefficient from 10 ECG windows including the one on top

2.3.2 Fiducial approach based on a modified super set

Fiducial features represent duration, amplitude differences, along with angles between 11 fiducial points detected from each heartbeat. These points are three peaks (P, R, and T), two valleys (Q and S) and six onsets and offsets.

Peaks detection Our Feature Extraction starts with detecting QRS by using some filters, starting with cancelation of DC shift and normalization, low-pass filtering, high-pass filtering, derivative filter, squaring and moving window integration to produce a vector of zeros and ones with the same size as the preprocessed signal. The ones in this vector determine the QRS interval in the whole signal.

This vector is divided into two vectors: left and right, determining the start of the Q peaks and S peaks, respectively, in the whole signal. Then, a window of 60 samples is created before the left (Q peak) and the max sample in it are gotten to obtain P peak. Another window of 125 samples is created after the right (S peak) and the max values in the samples are gotten to obtain T peak. R peak is detected by finding the max value between the left and the right. Q is detected by selecting the min values between the left vector, starting of Q peaks and R locations. S is detected by selecting the min values between the R locations and right vector, starting of S peaks. Therefore, we have detected the P, Q, R, S ,T locations and peaks, as well as the R interval (samples from Q to S), as shown in Fig. 3a, b.

Fig. 3
figure 3

a ECG-ID subject 1 the detected peaks (P, Q, R, S, and T) and b a zoomed ECG-ID subject 1 preprocessed signal detected

ECG peaks selection After the peaks and the locations have been detected, we need to determine the most informative fragment in the signal that leads to a significant effect in the last classification outcome.

The most important and effective component is QRS complex, while P and T are considered the most uncertain and suspicious components. P wave has low amplitude and can be affected by noise, while T wave has a dynamic location and its position always depends on the heart rate. Although T and P can enhance and provide some useful information to be used for improving the system accuracy, they will also enhance some questioning to the extraction, or the processing techniques. In order to determine the power of T and P waves, it is decided to select four informative fragments to be used: QRS, P-QRS, QRS-T and P-QRS-T, as shown in Fig. 4. Those four fragments are considered the most informative ones. In order to choose which fragment is considered to be the most informative, an experiment is done to find this. Finally, from this experiment, P-QRS-T fragment is selected as the most informative fragment, producing a better performance than the other fragments.

Fig. 4
figure 4

Variants of cardiac cycle information fragments [25]

PQSRT fragment We want to select the most discriminant fragments that most describe the signal and that can represent the signal in the identification process. Subsequently, a lot of checks have been made on each P-QRS-T fragment in the signal to determine whether to take this fragment, or leave it.

Table 1 The labels for the 38 fiducial features of the modified super set

Firstly, for each P-QRS-T fragment the R interval in it is checked. In other words, the number of samples of the R interval in P-QRS-T fragment must be greater than 30 samples and less than 70 samples. Otherwise, the P-QRS-T fragment is rejected. Next, for each P-QRS-T fragment, we calculate some amplitudes, distances and means, such as RQ, RS for amplitudes, PR, RT, QS, RQ, RS for distances RQ, RS, PR, RT, QS, RQ, RS for means and the mean of R interval. For each two successive P-QRS-T fragments RR distances, mean and the median of RR distances are calculated. For the mean and the median of RR distance, we choose the minimum value between them and it will be the RR thresholding, as it shows better performance experimentally. The last step is selecting the most similar P-QRS-T fragments. This is done by putting some restrictions, conditions and weight sum for the ECG fragments. For each two successive P-QRS-T fragments if RR location is < 0.9* RR thresholding, then the first P-QRS-T fragment is chosen. Otherwise, the first P-QRS-T fragment is rejected.

Fig. 5
figure 5

a P-QRS-T fragments computed from ECG-ID, b mean P-QRS-T fragments computed from ECG-ID Subject 1 and c mean P-QRS-T fragments computed from ECG-ID Subject 1 containing Pb, Pe, P, S, R, Q, Tb, T, Te

While the condition of thresholding is satisfied for each two successive P-QRS-T fragments, starting from the ECG signal, we create successive conditions starting by: “If the R interval of the first P-QRS-T fragment, subtracted from the mean of the R interval, is less than the R interval of the second P-QRS-T fragment, subtracted from the mean of the R interval, then increase the weight sum of the first P-QRS-T fragment by 0.3. Otherwise, increase the second P-QRS-T fragment.”

The same is made for RT distance in each two successive fragments by weight 0.3, PR distance, RQ amplitude, RS amplitude, QS distance by higher weight, equal to 0.75, as these amplitudes and distances are more effective experimentally. At the end, the fragment that gives a higher weight is selected from the ECG signal. We pick the highest weight P-QRS-T fragments representing the signal. The P-QRS-T fragment length is fixed at 281 samples for each cardiac cycle, regardless of the actual lengths of PR, QRS, and QT intervals. 281 samples (110 samples to the left of R peak and 170 samples to the right) are extracted and analyzed. Figure 5a shows the selected and extracted P-QRS-T fragments satisfying the condition and restriction made on ECG signals.

PQSRT fragment mean For each ECG record, P-QRS-T fragments are extracted. Since P-QRS-T fragment samples are used as informative features, extracted P-QRS-T fragments are processed to enhance their similarity, and their size is about 281 samples. Figure 5b shows the mean P-QRS-T fragment obtained from the selected P-QRS-T fragments.

After obtaining the mean fragment, the modified superset features are obtained from detecting the most important peaks in the mean fragment, such as Pb, Pe, Tb, Te, P, Q, R, S, T. Our modified superset of 38 features that represents the majority of features utilized in the literature is extracted from the mean heartbeat, as shown in Table 1. These features encompass 18 temporal features (distances between fiducial points), 12 amplitude features and 3 angle features. We added another 2 spectral features: entropy and energy, as shown in Fig. 5c.

2.3.3 Fusion between fiducial and non-fiducial approach

This feature extraction is based on a fusion approach between fiducial and non-fiducial. It uses the fiducial approach based on getting P-QRS-T fragments and obtaining the mean fragment that contains 281 samples only without the modified super set features. Then, the mean fragment will be applied to a discrete wavelet transform as a non-fiducial approach.

Wavelet can be described by using two functions, the scaling function \(\emptyset \)(x), as known as ‘father wavelet’ and the wavelet function or ‘mother wavelet’ \({\varphi }\) (x) undergoes translation and scaling operations to give self-similar wavelet families as follows.

$$\begin{aligned} \varphi _{{a},\tau } \left( {x} \right) ={a}^{-\frac{1}{2}}\varphi \left( \frac{{x}-{\tau }}{{a}} \right) \end{aligned}$$
(3)

There is a limitation for the wavelet and scaling function, so DWT was developed. DWT can be implemented as a set of filter banks, comprising a high-pass filter and low-pass filter.

The signal can be decomposed into many levels using different families of wavelets. In our approach, first-level discrete wavelet decomposition is applied to the mean P-QRS-T fragment by using discrete ‘db5’ wavelet. This mother wavelet belongs to Daubechies wavelet family [18]. Thus, regarding the mean P-QRS-T, about 281 samples are passed to first level DWT decomposition. Then, a feature vector is formed by combining fiducial and non-fiducial approach, forming a feature vector that consists of 145 samples used as input to the classifiers for each ECG signal. The non-fiducial approach uses 30 features generated from the (AC/DCT). For the fiducial approach using the modified super set, 38 features are produced. The fusion between fiducial and non-fiducial produces 145 features, using DWT.

2.4 Classification

Classification is a process in which the extracted features are compared against the stored templates to generate match scores. In the following subsection, we will discuss three different types of classifiers, starting with SVM, ANN and KNN. Their identification results are discussed in details.

2.4.1 Support vector machine (SVM)

Support vector machines (SVM) are a powerful technique for pattern classification by Vapnik [19].

The major advantage of SVM is its ability to classify unknown data points with high accuracy. The classifier performances for small sample learning problems have been improved by applying sequential minimal optimization (SMO) and polynomial kernel function idea [20]. The SVM has shown a better generalization performance in many practical applications. The SVM decision function is defined as follows:

$$\begin{aligned} {F(y)} =\sum \limits _{{i=1}}^{{N}} \alpha { K}\left( {x}_{{i}} ,{y} \right) +{b} \end{aligned}$$
(4)

where y is the unclassified tested vector, \(x_{i}\) are the support vectors and \(\alpha {i}\) their weights, and b is a constant bias. \(\textit{K}({x}_{{i}} , y)\) is the kernel function (polynomial kernel) which performs implicit mapping into a high-dimensional feature space.

2.4.2 Artificial neural network (ANN)

The classification operation of the neural network begins with sum of multiplication of weights and inputs, plus bias at the neuron [21]. Mathematically, the equation from this model of the neuron interval activity can be shown as:

$$\begin{aligned} {Y}_{{k}} ={f}\left( \sum \limits _{{j=1}}^{{ n}} {w}_{{jk}} {Z}_{{j}} +{w}_{{ k0}}\right) \quad \hbox {for}\quad k=1,2,\ldots ,{L} \end{aligned}$$
(5)

where \({Z}_{{j}} ={f}(\sum \nolimits _{{i=1}}^{{ d}} {w}_{{ij}} {x}_{{i}} +{w}_{{j0}} )\) for \(j=1,2,\ldots ,{n};{x}_{{i}}\) are the inputs of the network, \({w}_{{ij}} \) the weights between the input and hidden layer, \({w}_{{j0}} \) is the initial bias of hidden nodes and f is some transfer function. \({Z}_{{j}} \) the outputs of the hidden layer. \({Y}_{{k}} \) is the output of the network, \({w}_{{jk}}\) is the weight between the hidden and output layer, and \({w}_{{k0}} \) is the initial bias of the output layer.

2.4.3 K-nearest neighbor (KNN)

K-nearest neighbor algorithm (KNN) is a classification method based on closest training samples [22]. We have focused on comparing the distances by using Euclidean distance. Let N is a test set described with parameters as \([{N}_{1,1},{ N}_{1,2}, {N}_{1,3}\ldots ,{ N}_{{j,k}}]\) and M is a training set and described as \([{M}_{1,1}, { M}_{1,2}, { M}_{1,3}\ldots ,{ M}_{{i,k}}]\). Finally, the decision rule of the highest similarity in KNN can be written as follows:

$$\begin{aligned} {C}= & {} \hbox {argmax}_{{C_{u}}} \hbox { score}\left( {N}_{{j}} ,{C}_{{u}} \right) \nonumber \\= & {} \sum \limits _{{{Nj}}\,\in \hbox { KNN}\left( {M}_{{i}} \right) } \hbox {Sim}({{Nj}} ,{M}_{\mathrm{i}} )\delta \left( {M}_{{ i}} ,{C}_{{u}} \right) \end{aligned}$$
(6)

where C is the label assigned to the test feature \({N}_{{ j}} ; \hbox {score}\left( {{N}_{{j}} ,{C}_{{u}}} \right) \) is the score of the class \({C}_{{u}} \) with respect to \({N}_{{j}} ; \hbox {Sim}\left( {{N}_{{j}} ,{M}_{{i}}} \right) \) is the maximum similarity between \({N}_{{j}} \) and the training feature \({M}_{{i}} ;{\delta }\left( {{M}_{{i}} ,{C}_{{u}}} \right) \) indicates whether \({M}_{{ j}} \) is a part of class \({C}_{{u}} \).

3 Experiments and discussion

The experiments are carried out on the platform of core i7 with 3 GHz main frequency and 6 G memory, running under window 8 64 bit operating system. The algorithms are developed via the discrete wavelet transform toolbox Matlab 2014b (The Math works). Our classification algorithms are used from the weka software. In ECG-ID database containing 310 recordings, we have showed our results on 308 records. The number of training records is 200, and the number of test records used is 108. Two recordings are not used as their files have been corrupted. For MIT-BIH Arrhythmia, we have shown our results on 30 subjects. For each subject, 40,000 samples are used for training and 20,000 samples are used for testing. This leads to 120 records for training and 60 records for testing. Note that each 10,000 sample represents a record. This section provides experimental results that consider the identification performance comparison of SVM, KNN and ANN classifiers. This will be on the ECG data sets, predefined by using several different feature extraction algorithm based on non-fiducial, fiducial and fusion approach between them.

Table 2 The classification results for ECG-ID and MIT-BIH with the three approaches in feature extraction and classification

We have tested our algorithms on two different databases: ECG-ID and MIT-BIH Arrhythmia. Table 2 shows the number of the TP (number of correctly classified subjects) and FP (number of incorrectly classified subjects) from each of the three classifiers on each of the two databases. Three measurements are used to evaluate the performance of our approaches accuracy, precision and error rate. Accuracy % (A) is the percentage of correctly classified records, and precision (P) is the fraction of the correctly classified records among the total amount of records. Finally, the error rate (ER) is the total error value and it is considered the fraction of the incorrectly classified among all of the records.

$$\begin{aligned} {A}=\frac{\hbox {TP}}{\hbox {TP}+\hbox {FP}}{*}100, \quad {P}=\frac{\hbox {TP}}{\hbox {TP}+\hbox {FP}}, \quad \hbox {ER}=\frac{\hbox {FP}}{\hbox {TP}+\hbox {FP}} \end{aligned}$$
(7)

Table 2 shows the identification performance of our approaches. The classification results for the MIT-BIH Arrhythmia database, using Non-fiducial and fiducial approach have not achieved a high accuracy by using each feature extraction algorithm alone. Although the accuracy results have remained over 90 % while using the fusion approach, the accuracy has risen and has reached 100 % accuracy in each of the three classifiers. When working on a larger database, such as ECG-ID, the accuracy has decreased in both non-fiducial and fiducial approach, as the ECG-ID contains 90 subjects. While in the fusion approach, the accuracy has reached 99% using SVM, 98% for KNN and about 95% is achieved using ANN.

Table 3 Classification performance comparisons of the proposed scheme with some existing schemes using ECG-ID and MIT-BIH

In comparison with previous works that used almost the same techniques, we can conclude that our approach has proven better results. For ECG-ID database comparison, as shown in Table 3, Beil et al. [23] have discussed in their work 20 subjects from ECG-ID database, 85 records for training and 50 records for testing by using heartbeat waves amplitudes, intervals duration for feature extraction and soft independent modeling of class analogy, achieving an accuracy of 98%. Shen et al. [13] have discussed 20 subjects from the database, 20 heartbeats for each record and 1 heartbeat for each ECG by using QRS complex, as well as T wave amplitudes and intervals duration for feature extraction and for classification. This is done by using template matching and decision-based neural network, achieving an accuracy of 95, 80%, respectively, and 100% for both. Yi et al. [24] have worked on 9 subjects, 9 records of one day, 30 fragments for each record used for training, 9 records of another day. All fragments for each record used for test use coefficients of the wavelet decomposition of successive ECG fragments, 10 s long, principal component analysis for feature extraction, as well as reduction and probabilistic neural network for classification, achieving 95% accuracy. Nemirko and Lugovaya [25] has achieved a great progress in this database, using all the database subjects. They have used 195 records, 6 to 10 heartbeats for training and 115 records, 6 to 10 heartbeats for testing. This is achieved by using samples of cardiac cycle fragment, containing the QRS complex, P and T and principle component analysis, or wavelet transform for feature extraction and reduction, using linear discriminant analysis and Majority Vote Classifier for classification to achieve an accuracy of 96%. Dar et al. [26] have presented an approach for identification of ECG-ID, based on guided filter (GF), Euclidian measures, dynamic time wrapping (DTW) and PCA for authentication. They have worked on a dataset of 89 subjects from the 90 and have used 2 records for each subject, achieving an error rate of 2.4% using PCA combined with GF. Chun [27] have addressed the challenging database ECG-ID on all of the 90 subject work by applying fusion between DWT features, heart rate variability-based features and reduction by using best first search with the random forest for classification to achieve an accuracy of 83.33%. Our proposed method has worked on this database, using all the subjects 200 records for training, all the heartbeats in the records and 108 records for testing. All the heartbeats have achieved an accuracy of 99%; better and higher than the previous studies. For example in MIT-BIH arrhythmia database comparison, as shown in Table 3, Tang and Shu [28] have used in their work 10 subjects in the MIT-BIH arrhythmia, using wavelet transform, a rough set for feature extraction and reduction and quantum neural network for classification, achieving accuracy of 91%. Wang et al. [29] have discussed the MIT-BIH arrhythmia database, working on 9800 samples from different eight heart beats and using principle components analysis, linear discriminant analysis for feature extraction and probabilistic neural network for classification, achieving about 99.71%. Ting and Salleh [11] have worked on 13 subjects, using extended Kalman filtering and log-like hood for classification, achieving an accuracy of 87.50 %. Islam et al. [30] have addressed 26 subjects, using (HBS) as features and achieving an accuracy of 99.85%. Most of the previous studies have focused on a small number of subjects and a small amount of samples, but our proposed method focuses on 30 from the 47 subjects, containing different heartbeats, and uses 20,000 samples for testing and 40,000 samples for training for each subject, achieving an accuracy of 100%.

Our contribution in this paper is to prove that ECG can be used as a biometric. This goal is already achieved from a lot of important issues. Firstly, we have worked on a large database consisting of 90 subjects and have achieved a high accuracy about 99%. Secondly, we have addressed another database consisting of diseases to show the strength of our fusion approach and have achieved a high performance about 100%. Thirdly, we addressed a large number of samples and ECG heart beats in training and testing. Other systems just use a small amount of ECG heart beats, achieving a high accuracy, and when the number of the heart beats increases, the performance starts to degrade. Fourthly, we have shown the use of a non-fiducial, fiducial and fusion between them, and how the fusion can increase the performance. The times consumed are 1.2, 0.75 and 1.1 min for Non-fiducial, fiducial and fusion respectively on the largest database used

4 Conclusions

This paper proposes hybrid ECG system identification. The proposed approaches contain data acquisition, preprocessing, feature extraction and classification phases. ECG signals obtained from the MIT-BIH and ECG-ID are used for the training and testing processes. We have applied three different methods for feature extraction based on non-fiducial, fiducial and fusion and three classifiers such as SVM, ANN and KNN. The results of the system are compared with other methods. According to the comparison results, the proposed method is able to provide robust ECG Signal classification. The results show accuracy of 100% for MIT-BIH; using SVM, ANN, KNN and 99 % for ECG-ID; using SVM and KNN. Our main strength is using fusion approach between fiducial and non-fiducial.

With the help of the above approaches, one can develop software for a biometric system for the detection of ECG signals of different individuals. We have worked on a database that contains a large set of subjects and achieves a high accuracy to ensure that ECG can be applied in security system applications. Further studies are ongoing for improving the classification accuracy and work on a large datasets, in order to create a generalized system for ECG identification.