1 Introduction

Myocardial infarction (MI) happens when the blood supply to part of the heart decreases or stops entirely, initiating damage to the heart muscle and possible death. Globally, about 15.9 million people suffered from MI in 2015 [1]. As MI develops rapidly and initially has no obvious symptoms, detection and treatment are time-critical [2]. The sooner MI is detected, the more means are available to control its effect on left ventricle contractility and function, leading to better therapeutic outcomes and prognosis [3]. Therefore, the development of techniques for early and rapid MI detection is a worldwide research goal.

Electrocardiography is a diagnostic tool in which electrodes are placed on the skin to collect information about the heart’s electrical activity over time [4]. If the electrical or contractile function of the heart is interrupted due to myocardial ischemia, the whole myocardial electrical signal flow will also be affected. For example, in the case of MI, electrocardiogram (ECG) signals often manifest as S–T segment elevations and Q waves. Twelve-lead electrocardiography is a traditional clinical means of monitoring changes in cardiac electrical activity to assess the risk of MI. Before this technology became popular, clinicians had used Minnesota coding [5] to extract time domain features of ECG signals to identify heart diseases such as MI. However, this time-consuming technique required a certain amount of clinical experience for reliable results, as naked-eye inspection of an ECG may lead to misdiagnosis of MI. Over time, with technological development, Fourier and wavelet analyses have been applied to ECG signal processing. Meanwhile, researchers have started to study the frequency domain and time–frequency domain features of ECG signals for MI diagnosis, which has thus entered the era of computerized automatic recognition.

Table 1 Characteristics of ECG data obtained from PTB database
Fig. 1
figure 1

ECG signal after removal of baseline wander and R-peak detection

The analysis of ECG signals includes preprocessing, peak recognition, fragment segmentation, feature extraction, feature selection, dimensionality reduction, classification and model evaluation. The most important step is appropriate feature selection. An ECG contains diagnostic features in the time, frequency and time–frequency domains.

This study focused on identifying features that reflect the correlation between the 12 leads, assuming that MI affects the relationship between these leads. In addition, we extracted features related to energy and information domains. We used Matlab 2017b for the data analysis.

2 Material

We drew our data from the publicly available PTB Diagnostic ECG Database [6, 7]. The database contains 148 MI patients and 52 healthy control subjects. The details of this database can be seen in Table 1. The data set contains 12-lead ECG signals (I, II, III, avR, avL, avF, \(V_1\), \(V_2\), \(V_3\), \(V_4\), \(V_5\), \(V_6\)) corresponding to each person, which are digitized at 1000 samples per second. Each record in the database was acquired over a different time period, mostly around 2 min.

3 Method

3.1 Processing

Using the Daubechies 6 (“db6”) [8, 9] wavelet basis function, we de-noised and eliminated the baseline wander of the ECG signals. We used a 0.081–20.833 Hz bandpass filter for filtering, which corresponds to the \(3\mathrm{rd}\)\(11\mathrm{th}\) level sub-band. The most useful information of ECG signals is concentrated in this frequency band.

ECG signals are segmented into numerous small sections (one beat per section). We used the Pan–Tompkins algorithm [10] to detect the R-peaks. This algorithm is a robust R-peak identifier with computational simplicity and ease of implementation.

Figure 1 shows an ECG signal after de-noising, removal of baseline wander and R-peak detection. The upper panel is the overall ECG signal, and the lower panel is a magnified local section. As can be seen from the upper panel, each ECG cycle on the whole-time axis has almost the same height after baseline drift removal, and the signal becomes smoother after de-noising. Using the Pan–Tompkins algorithm, we detected all QRS waves, which are circled in Fig. 1.

After detecting the QRS complexes, each set of 255 samples before a QRS peak, 256 samples after the peak and the QRS peak itself were grouped as a 512-sample segment and considered as a single beat for the subsequent analysis. To ensure that all data had 512 sample points, we removed the first and last beat.

In total, we obtained 28,700 beats. There were 21,569 segments in the MI group and 7131 segments in the control group. After obtaining the ECG fragments, the 512 data points were normalized to make the total variance equal to one.

3.2 Wavelet analysis

Given the heart beats consisting of 512 samples, we used the maximal overlap discrete wavelet transform (MODWT) [11] to decompose the signal. MODWT, a modification of the discrete wavelet transform (DWT), does not use a down-sampling process, so wavelet coefficients at all scales are equal in length to the original time series. Each beat was decomposed into seven levels using the finite-impulse response approximation of Mayer’s wavelet (“dmey”) [12]. Hence, we obtained five (\(3\mathrm{rd}\)\(7\mathrm{th}\)) wavelet coefficients for detailed sub-bands and one wavelet coefficient for an approximate sub-band. Those six sub-bands contained the frequencies between 0.326 and 20.833 Hz, which captured almost all of the information of the original signal. Each wavelet coefficient series had 512 values.

3.3 Feature extraction

Used six MODWT-decomposed wavelet coefficient series, we calculated five types of features. Those features were taken from three domains: energy, time series similarity and information. The five features were variance, inter-quartile range (IQR), Pearson correlation coefficient, Hoeffding’s D correlation coefficient (henceforth Hoeffding’s D) and Shannon entropy.

Variance The variance of the MODWT coefficients of the ECG signal beats, which has a linear relationship with the signal energy in the specific frequency band. It is calculated as

$$\begin{aligned} V_{X,j}=\frac{1}{M_{j}}\sum _{t=1}^{M_{j}}\left( W_{X,j,t}-{\overline{W}} _{X,j}\right) ^{2} \end{aligned}$$
(1)

where \(j = 1,\dots ,6\) denotes the MODWT decomposition level; \(W_{x,j,t}\) is value of a coefficient t at decomposition level j; \(M_{j}\) is the number of the coefficients at decomposition level j; \({\overline{W}} _{X,j}\) is the mean at decomposition level j.

Fig. 2
figure 2

Pearson correlation coefficient

Fig. 3
figure 3

Hoeffding’s D correlation coefficient

IQR The difference between the \(75\mathrm{th}\) and \(25\mathrm{th}\) percentiles of each ECG signal beat’s MODWT coefficients, reflecting the fluctuation of data in a nonparametric form. It is calculated as

$$\begin{aligned} \mathrm{IQR}_{X,j}=P_{75}(W_{X,j,t} )-P_{25}(W_{X,j,t} ) \end{aligned}$$
(2)

where \(P_{\cdot }(\cdot )\) represents the corresponding quantile.

Pearson correlation coefficient Measures the similarity between two time series assuming that the data follow a normal distribution. It is calculated as

$$\begin{aligned} \rho _{XY,j}=\frac{\sum _{t=1}^{M_{j}}\left( W_{X,j,t}- {\overline{W}}_{X,j} \right) \left( W_{Y,j,t}- {\overline{W}} _{Y,j} \right) }{\sqrt{V_{X,j}}\sqrt{V_{Y,j}}} \end{aligned}$$
(3)

where \(\sqrt{V_{X,j}}\) is the standard deviation of the wavelet coefficients \(W_{X,j,t}\); \({\overline{W}} _{X,j}\) is the mean of the wavelet coefficients \(W_{X,j,t}\); \(M_{j}\) is the number of the coefficients at decomposition level j.

Hoeffding’s D [13] A nonparametric measure of association that detects more general departures from independence. The statistic approximates a weighted sum over observations of chi-square statistics for two-by-two classification tables. It is calculated as

$$\begin{aligned} D_{XY,j}= & {} \frac{(n-2)(n-3) D_1+D_2-2(n-2)D_3}{n(n-1)(n-2)(n-3)(n-4)} \end{aligned}$$
(4)
$$\begin{aligned} D_{1}= & {} \sum _{i=1}^{n}(Q_{i}-1)\left( Q_{i}-2 \right) \end{aligned}$$
(5)
$$\begin{aligned} D_{2}= & {} \sum _{i=1}^{n}(R_{i}-1)(R_{i}-2)(S_{i}-1)(S_{i}-2) \end{aligned}$$
(6)
$$\begin{aligned} D_{3}= & {} \sum _{i=1}^{n}(R_{i}-2)(S_{i}-2)(Q_{i}-1) \end{aligned}$$
(7)

where given decomposition level j, \(R_i\) is the rank of \(W_{xji}\); \(S_i\) is the rank of \(W_{yji}\); \(Q_i\) is the number of points with both \(W_{xji}\) and \(W_{yji}\) values less than the \(i\mathrm{th}\) point; n is the number of the coefficients.

Shannon entropy [14] One of the spectral entropies, used to quantify the spectra of the ECG signals. It is calculated as

$$\begin{aligned} H(U) = E[-\mathrm{log}{p_{i}}]=-\sum _{i=1}^{n}p_{i}\mathrm{log}{p_{i}} \end{aligned}$$
(8)

where \(p_{i} = n_{i}/N\), N is the number of samples; \(n_{i}\) is the number of samples in the \(i\mathrm{th}\) bin; n is the number of the bin.

Given the time series of ECG signal, Maharaj and Andrses [15] used the variance and Pearson correlation coefficient to extract features and achieved a good classification performance. Considering that the MODWT coefficients may not follow a normal distribution and to enhance the robustness, we calculated the IQR and the Hoeffding’s D. Finally, in addition to the analysis of energy and similarity between two time series, we also needed to consider the information quantity or complexity, of the time series . We used Shannon entropy for this purpose. Therefore, in total, we calculated 636 features, including 48 variances, 48 IQRs, 264 Pearson correlation coefficients, 264 Hoeffding’s D values and 12 Shannon entropies.

3.4 Feature selection

Due to the large number of features, it is necessary to select the particular features that offer the most significant information for MI detection. In this paper, we performed Student’s t test and obtained the t values of each feature. A higher t value means that a feature is more important, so we ranked the features using their t values. We set a threshold t value and selected the features whose t values were larger than the threshold. We selected the threshold by calculating the accuracy of each case and selecting the most accurate case.

In the process of 10 cross-validations, we obtained a group of suitable features each time. We then sorted all of the features by the number of times that they appeared. Features appearing more than five times were included in the final feature group.

3.5 Model training and validation

The k-nearest neighbor (kNN) classifier [16] was selected to discriminate the MI patients and healthy people. The kNN, relating the unknown sample to a known sample, is an instance-based classifier. In this study, we chose k equal to 5, which means that the five nearest neighbors within the unknown sample were used for discrimination.

We used tenfold cross-validation (10-CV) and leave one person out cross-validation (LOPOCV) to validate our method. We calculated the accuracy, sensitivity and specificity to validate the model.

Obviously, a lack of data independence is likely to occur when using 10-CV, as the data of the training set and test set may come from the same person. To avoid this problem, we also chose one person at a time, modeled the rest of the people and predicted all of the leads of the selected person. After repeating this for all of the people in the data set, the average accuracy was calculated to evaluate the model. This method is conceptually similar to the leave one out cross-validation method, hence the name LOPOCV.

4 Result

4.1 Feature describe

A total of 28,700 beats were segmented from 200 subjects (52 healthy and 148 with MIs). After 636 features were extracted, we used heatmaps to describe the mean value of Pearson correlation coefficient and of Hoeffding’s D.

Figures 2 and 3 are the visualized results of the correlation between leads. Each graph contains two rows and six columns, giving a total of twelve sub-graphs. The first row shows the healthy controls, and the second row shows the patients with MI. From left to right, the six columns are the detailed wavelet coefficients of layers 3 to 7 and the approximate coefficients of layer 7. Inside each sub-table is a visualization of an upper triangular 12-order matrix. The (i, j) elements in the matrix represent the correlation between the i lead and the j lead and are colored according to the value. Note that the upper and lower limits of the Hoeffding’s D change with the number of samples. When the sample size is small, the range of Hoeffding’s D is \(-0.5\) to 1, but the minimum of D rapidly increases with sample size [17].

From the heatmap, some differences between the features in the MI group and healthy group can be seen. For example, the Pearson correlation coefficients in a certain region are positive in healthy people, but negative in MI patients. Those features may facilitate automatic detection of the risk of MI in healthy people.

Subsequently, because the model was validated in two ways, we discuss the outcome of kNN individually in each case. For the detailed LOPOCV results in Fig. 4, each iteration is represented by green curves, and the red curves represent the median accuracy at each threshold, while the pink curve is the \(P_{25}\) and \(P_{75}\) accuracy, and the blue curve is the average accuracy. Similarly, for each threshold, we used cross-validation to calculate the corresponding accuracy. Note that for the same threshold, the features obtained from different iterations may be slightly different. From the outcome above, we concluded that when 80 was selected as the threshold t value, we achieved the highest accuracy in general. On average, 22 features were found at each iteration. The details of the selected features are shown in Table 2. In Table 2, SD means the standard deviation. The symbol Hr means the Hoeffding’s D and Pr the Pearson correlation coefficient. The symbol \(\mathrm{Hr}^{1,4}_3\) means the Hoeffding’s D of \(3\mathrm{rd}\) level detail wavelet coefficient between \(1\mathrm{st}\) lead and \(4\mathrm{th}\) lead; other features follow the same pattern.

Table 2 The detail of 22 selected features
Fig. 4
figure 4

The outcome of each iteration in LOPOCV

4.2 Model performance

As shown above, the optimal threshold of the t value is 80. Given this t value, we achieved an accuracy of 99.57%, sensitivity of 99.82% and specificity of 98.79% for 10-CV and accuracy of 87.96%, sensitivity of 93.24% and specificity of 72.01% for LOPOCV. The accuracy of LOPOCV is given as the average. The outcomes of the other thresholds are shown in Table 3, where Acc, Sen and Spc mean accuracy, sensitivity and specificity, respectively. Note that the accuracy of LOPOCV is not symmetrically distributed. When the threshold is 80, the median accuracy of LOPOCV is 98.88%, and the lower and upper quartile accuracies are 90.26% and 100%.

Table 3 The outcome of 12 threshold values (%)

5 Discussion

A novel methodology for the detection of MI by using robust features from 12-lead ECG signals is proposed in this paper. We used three types of feature (categorized as energy, sequence similarity and information features) to predict the risk of MI. In the energy and sequence similarity categories, we calculated both the normally distributed features and their corresponding nonparametric features.

5.1 Advantage and disadvantage

In this study, we focused on the difference in the correlation between the 12 leads between MI patients and healthy people. The selection of meaningful correlation coefficients, capturing the features that describe the similarity between leads, has good practical guiding value in exploring the mechanism of disease. That is, the changes in the similarity between the leads reflect the structural changes of the heart during MI, which can be explained from a clinical point of view. Similarly, variance reflects the energy of the leads, while entropy reflects the information contained in them.

Another innovation of this study is its focus on robust features. When describing the energy of a frequency band in a lead, we extracted not only the traditional feature, variance, but also the IQR, which reflects the lead fluctuation more robustly. Similarly, when describing the similarity between leads, in addition to the traditional Pearson correlation coefficient, we also extracted Hoeffding’s D, which is less susceptible to the influence of outlying data.

Although the features used in this study are simpler in theory than those in similar studies, they nonetheless have better explicability. In addition, through repeated experiments, we finally chose 22 features, a relatively large number compared with other studies. However, these features themselves are relatively simple to calculate, so the total time spent on feature calculation was not longer than in other studies.

In addition, when repeating the above simulation 200 times, we found that although the set of features selected in each iteration differed slightly each time, it always maintained high accuracy in the test set. This shows that for diverse groups of people, our method can find a corresponding group of features. The discriminatory ability was high when predicting the MI risk in new people who were close to the average. This demonstrates the generalizability of our method in practical application.

One disadvantage of this method is that a large number of features were included in the feature extraction stage, and there may have been collinearity in the process of feature selection. Another problem is that the same variable selection method may not always be able to find a consistent feature group for different populations. However, this problem is common to all methods of automatic detection of MI and needs further research.

5.2 Compares

Table 4 is a summary of 17 studies using the PTB database to identify MI patients from healthy controls. The table contains the year of publication, the type of feature extracted, the classifier used, the model validation method and the accuracy, sensitivity and specificity. Through comparison, it can be found that the accuracy, sensitivity and specificity of this study are better than those of previous studies. A brief summary of published research is given below.

Table 4 Summary of automated diagnosis of MI using PTB database

Half of the cited studies calculated specific features based on the wavelet coefficients of signals. Acharya et al. [25] calculated the entropy of the DWT coefficients of ECG signals. Similarly, Sharma and Sunkaria [31] calculated the entropy and energy of SWT coefficients and the median slope of the original signal. Mohit et al. [28] used sample entropy as features after processing the ECG data using rational-dilation wavelet transform. Sharma et al. [32] used three entropy-based features based on the optimal biorthogonal wavelet filter bank. Sharma et al. [23] generated multiscale energies and eigenvalues from specific wavelet coefficients bands. Banerjee and Mitra [20] analyzed ECG data using cross-wavelet transform and explored the resulting spectral differences.

Some studies used wavelet analysis for feature extraction. Sun et al. [18] approximated the segmented ECG signal by a 5-order polynomial, using polynomial coefficients as features. Similarly, Bin et al. [22] used different orders of polynomial fitted signals and took the coefficients as features. Correa et al. [19] analyzed the morphological features reflecting depolarization and repolarization of ECG signals. Dionisije et al. [34] calculated the time and frequency domain features of the signal and focused on the complexity of the algorithm. Sadhukhan et al. [33] calculated the value of each phase as a feature after discrete Fourier transformation.

Some studies have not extracted specific features from wavelet coefficients, but treated those coefficients as a whole. Remya et al. [24] used the coefficients of DWT to train artificial neural networks. Acharya et al. [26] obtained three kinds of coefficient using discrete cosine transform, DWT and empirical mode decomposition and then used locality-preserving projections for dimensionality reduction (taking the projection factor as a feature). Bhaskar [21] processed wavelet coefficients using principal component analysis and took the principal components as features. Some researchers have processed ECG raw signals using artificial neural networks to differentiate MI patients from healthy people. Among them, Acharya et al. [27], Liu et al. [30] and Reasat and Shahnaz [29] used different types of CNN to work out this task.

Most of the existing studies have used 10-CV for model evaluation, while a few used test sets. The disadvantage of both approaches is that the data used for model evaluation and for modeling may come from the same person, that is, the samples between training and test sets may not be independent. Reasat and Shahnaz [29] paid attention to this point. That study chose one person at a time, modeled the rest of the people and predicted all of the leads of the selected person. After repeating this for all of the people in the data set, the average accuracy was calculated to evaluate the model. This avoids the situation in which the test and training sets are not independent and improves the generality of the results.

5.3 Application and future plan

Hospitals can be envisioned to establish ECG databases for MI patients and healthy people according to their specific needs and then calculate the corresponding features according to the methods proposed in this paper. To analyze the ECG data of the patients for clinical purposes, they can be input into the constructed model to obtain predictive results.

Many aspects of this study can be extended. We will further analyze the specific clinical implications of the extracted features, which need to be discussed with clinicians and pathologists. We will also consider the application of this method to ECG data collected in local hospitals and further study the method’s practical value.

6 Conclusion

In this work, we computed five kinds of feature to analyze healthy and MI ECG segments. These features covered the energy, time series similarity and information domains, while taking robustness into account. For feature selection, we performed Student’s t test and used specific thresholds to find the optimal feature set. The kNN classifier was applied for classification. To alleviate over-fitting and improve the generalizability of the model, tenfold CV and LOPOCV were used for model validation. We have achieved impressive classification performance compared with similar studies, with 99.57% accuracy, 99.82% sensitivity and 98.79% specificity. Using LOPOCV, the accuracy, sensitivity and specificity of this study were, respectively, 87.96%, 93.24% and 72.01%. Due to the simplicity of feature calculation, our method promises to improve the speed of diagnosis and provide cost-effective medical services for patients and hospitals.

The proposed method is robust, accurate and cost-effective and can be applied to real-time monitoring, diagnosis and treatment of MI to reduce time, cost and medical resources.