1 Introduction

Cardiovascular disease (CVD) has been marked as the principal root-cause of casualties of about 17 million people every year worldwide [1]. CVD mainly refers to the diseases related to the heart [2]. The heart is a cone-shaped organ which requires constant supply of oxygen and nutrients [3,4,5]. It is responsible for supplying the blood to different organs of the body and it contracts at regular interval [6]. Any obstruction in the supply of blood leads to heart attacks (or heart diseases), which causes a lot of casualties every year [7, 8]. But unfortunately, the analysis of CVD is not a simple task as it involves a lot of complexities such as hypertensive, pulmonary, valvular, inflammatory cardiomyopathy, etc. [9,10,11]. An efficient electrocardiogram (ECG) signal can provide correct assessment of them [12,13,14]. This signal is generated due to active tissues of the heart, which generates electrical currents. The ECG signal is not random in nature and is even scheduled according to time period, shape, and heart rate [15]. It is an earliest and cheapest non-invasive diagnostic tool in the medical field for detecting CVDs based on examination of its P-QRS-T waves [16,17,18,19]. These waves arise due to chemical, electrical and mechanical processes within the heart [20, 21]. Any continuous alterations in these waves indicate possible cardiac arrhythmia [22,23,24] and require a different kind of clinical diagnostic observation [25,26,27,28,29,30]. Early detection of such alterations is utmost essential to figureout patient’s health condition timely and to reduce overall mortality rate [31]. Heart rate variability (HRV) in another factor that plays an important role in accessing the correct status of the cardiac health as a preliminary diagnosis method [14, 24, 32,33,34,35,36,37].

The early-stage detection of cardiac arrhythmia is of prime importance [1]. But during the acquisition of ECG data, different types of noise gets involved, which hide its important characteristics that mislead its analysis and introduces the non-linearity [38, 39]. Analysis of this nonlinear signal requires automated analysis as provided by computer-aided diagnosis (CAD). This, in turn, requires efficient techniques to handle the current incidences of CVDs occurring worldwide [40,41,42]. Also, presently the concept of personal ECG monitoring is coming up among patients of distinct age groups worldwide [43]. Therefore, there is a need of developing a framework involving efficient techniques from all domains viz. preprocessing, feature extraction, and classification in the domain of biomedical signal processing [44, 45]. In the existing literature, some authors have used time domain techniques which show good performance for cleaned ECG signals, i.e. without noise/artifacts. Later, frequency domain techniques were reported by some authors that have limited application due to spectral leakage. But all these techniques from both these domains were not able to effectively analyze nonlinear behavior of the ECG signals [46, 47]. For instance, in [48], Christov, I.I. et al. and in [49], Hamilton, P.S. & Tompkin, W.J. proposed heuristic methods for classifying an ECG signal, but the outcome was found to be highly dependent on proper selection of the band pass filter (3 dB frequencies). In [30], Kumar, M. et al. proposed a decision support system for atrial fibrillation using a flexible analytic wavelet transform. They have tested performance of the proposed technique on the basis of ACC, SEN, and SPE with random forest classifier. In [50], Rao, K.D. proposed R-peaks detection based on discrete wavelet transforms (DWT). In that research, R–R interval and data compression steps for ECG signals were presented along with R-peaks detection. The main obstacle in the application of DWT is its frequency resolution which is reduced during resampling [51]. Traditionally, DWT-based computationally efficient techniques have been reported in the literature for inverting system transfer function model, computing piecewise constant system response to arbitrary excitations and fractional system analysis [52,53,54]. In [45], Asadur Rahman, Md. proposed preprocessing techniques of ECG signal using a simple statistical approach. In that article, MATLAB was used to develop the set-up for this kind of preprocessing. In [55], Hanumantha Rao, G. and Rekha, S. presented a transconductor-capacitor filter for bio-medical signals’ applications. They have proposed low-voltage, low-power transconductor device with 5.85 nS and 0.8 V, respectively. Validation of this research work was done using a second-order Butterworth low-pass filter having a cutoff frequency of 100 Hz. In [56], Kora, P. proposed detection of myocardial infarction using Hybrid Firefly and Particle Swarm Optimization (FFPSO) that were used to optimize the raw ECG signal. But the main drawback of PSO is low convergence rate [57]. In [58], Pachori, R.B. et al. proposed a classification-based technique for analyzing datasets of diabetic and normal subjects based on RR-interval. In that research article, empirical mode decomposition (EMD), least square-support vector machine (LS-SVM) classifier, Radial Basis Function (RBF), Morlet wavelet, and Mexican hat wavelet kernel have been utilized. In [59], Jain S. et al. adaptive filters were used for QRS complex detection, but an appropriate reference signal was needed for its operation. In [9, 47, 60], Gupta, V. et al. proposed chaos theory as a feature extraction tool for ECG signal and used both real-time and standard datasets for demonstrations. The main limitations of this approach are: requirement of proper selection of time delay dimension (embedding), correlation dimension, Lyapunov exponent and entropy that is still a challenge. In [61], Hema Jothi, S. and Helen Prabha, K. proposed analysis of fetal ECG on the basis of adaptive neuro-fuzzy inference systems and undecimated wavelet transform. Comparison on the basis of MSE was carried out between the reported technique and standard discrete wavelet transform technique for evaluating the performance in that article. In [62], automated identification of normal and diabetic heart rate signals was proposed using approximate entropy, but it required knowledge of the previous amplitude values. In [63], Das, M.K. and Ari, S. proposed a denoising technique based on Stockwell transform. Validation of the reported technique was done using various normal and abnormal files of MB Ar DB. They added white Gaussian noise to the selected records of MB Ar DB to investigate the effectiveness of their technique. Various performance parameters viz. SNR, RMSE and PRD were estimated for comparison.

In the existing literature, most of the techniques are not suitable to handle high-frequency components efficiently. These techniques tend to trim the amplitudes of the QRS peaks increasing the false detection and duplicity in the detection process of its peaks [64]. This problem motivated the present authors to explore the use of efficient techniques that can provide better frequency information and results in more efficient and accurate detection of R-peaks by effectively filtering-out the high-frequency noise components.

In this paper, spectrogram (obtained using short-time Fourier transform) has been used, because it helps in effective measurement of time, frequency and power intensity information simultaneously through time–frequency analysis. Also, continuous wavelet transform (CWT) has been used for enhancing both time and frequency resolutions as compared to that provided by the spectrogram [65]. The benefit of using a spectrogram is due to the fact that Fourier transform has been known to be a good candidate for analyzing stationary signals.

Wavelet transform represents a nonlinear signal by translations and dilations of a window. It is of two types; CWT and discrete wavelet transform (DWT). DWT is not a good candidate for the present application due to reduction in the frequency resolution during resampling at each decomposition level. CWT, on the other hand, provides a good and consistent frequency resolution. Also, sufficient and dominant scale can be estimated for each component of the ECG signal in each dataset using it. It further helps in estimating each component separately from the selected ECG dataset [66]. Furthermore, the proposed use of AR technique further supplements the limitations of spectrogram, CWT and provides enhanced time and frequency resolution simultaneously. The proposed technique helps in getting more clear frequency information that is important for filtering-out the high-frequency noise components.

The paper is structured as; Sect. 2 describes materials and methods, Sect. 3 presents and analyzes the simulated results in detail, followed by conclusions at the end.

2 Materials and methods

The methodology proposed in this paper is shown in Figs.1, 2.

Fig. 1
figure 1

Recorded and storage/transmission set-up of ECG signal [67,68,69]

Fig. 2
figure 2

Generalized methodology for ECG signal analysis

2.1 ECG dataset (recording)

MIT-BIH Arrhythmia and Real-time databases have been used for validating the proposed methodology.

2.1.1 MIT-BIH arrhythmia database

Massachusetts Institute of Technology-Beth Israel Hospital Arrhythmia database (MB Ar DB) [70, 71] has been considered in this study. It has 48 recordings sampled at 360 Hz, with durations of 30–60 min using 2 lead arrangements. In this paper, all 48 datasets of MB Ar DB were downloaded from physioNet database and directly used for this study.

In this paper, 12 real-time recordings (RT DB) were also used to establish the performance of the proposed methodology in a practical scenario. The use of two databases in this paper is in line with other studies in the existing literature that made use of variety of databases for validating their work [39, 72,73,74,75].

2.1.2 Real-time ECG database

In this paper, 27 real-time recordings (RT DB) were also acquired at a sampling rate of 360 Hz, with duration of 10–30 min using two lead arrangements under the supervision of a well skilled lab technician. This data acquisition was obtained after permission letter from research ethics committee of NIT, Jalandhar, India along with willingness from each volunteering subject before ECG acquisition. In this data acquisition arrangement, 27 subjects participated who were aged between 23 and 72 years including research scholars, retired professors, and college students. Unfortunately, only 12 ECG recordings were appropriate for analysis purpose. These ECG datasets were stored directly in a personal computer using Biopac@MP35/36 equipments.

Figure 1 shows the set-up for recording along with a recorded real time ECG signal. The acquired data remains stored in a computer that may be used for data interpretation in future.

2.2 Preprocessing

The existence of different types of noises/artifacts during ECG signal acquisition makes the analysis of ECG signal more complex and difficult [76, 77]. Cardiologists/physicians/doctors use to face distinct problems in accessing the clinical datasets of the patients having CVDs in such situations [63, 78, 79]. These noises/artifacts may be due to motion, respiration, poor conditions of electrodes, base line wander (BLW), muscle noise, and power line interference (PLI) [63, 80,81,82]. In this paper, Savitzky–Golay digital filtering (SGDF) has been used for preprocessing of MIT-BIH Arrhythmia datasets as described in [83, 84]. It is a digital filter which is used for smoothing the raw ECG signal [60] that preserves all important clinical attributes after filtering [85].

SGDF is characterized by matrix \([\mathrm{g}]\) which has \(\mathrm{D}+1\) rows and \(2\mathrm{N}+1\) columns. Mathematically, it is represented as

$$\left[ {\text{g}} \right]_{{{\text{dn}}}} = \mathop \sum \limits_{{{\text{k}} = 0}}^{{\text{D}}} [\left[ {\text{W}} \right]^{ - 1} ]_{{{\text{dk}}}} {\text{t}}_{{\text{n}}}^{{\text{k}}}$$
(1)
$${\upalpha } = \left[ {\text{g}} \right]{\text{x,}}$$
(2)

where \({\alpha }\), x, n, W, k denotes filtering coefficients, input signal, columns’ index number (2 N + 1), weighting matrix, rows’ index number (D + 1), respectively.

2.3 Feature extraction

Various methods used for feature extraction in this paper are presented in next subsections.

2.3.1 Continuous wavelet transform (CWT)

CWT helps in analysis of non-stationary signals at multiple scales by considering an analysis window to extract signal segments [86]. Mathematically, the CWT [16] of a signal \(y(t)\) using a family of wavelet functions,\({\Psi }_{{\alpha },\upbeta }(\mathrm{t})\) is given by:

$$CWT\left( {\alpha ,\beta } \right) = \frac{1}{{\sqrt \alpha }}\int\limits_{{ - \infty }}^{\infty } {y\left( t \right).\Psi ^{*} } \left( {\frac{{t - \beta }}{\alpha }} \right)dt,$$
(3)

where β is translation factor, α is scale factor, * denotes complex conjugate, and finally ψ ∗ is a translated and scaled complex conjugated mother wavelet function.

2.3.2 Autoregressive (AR) technique

Among existing time–frequency analysis (TFA) techniques, auto-regressive (AR) technique offers good time–frequency resolution [87]. It estimates the order of the model of the considered ECG dataset to provide better results [88]. The order of this model is important as it indicates its number of poles [89]. AR analysis provides both power spectrum density (PSD) description and TFA [90]. For a signal \(\mathrm{y}[\mathrm{k}]\), if \(\mathrm{m}\) is the model order, δ[k] is the zero-mean white noise, then the AR process is written as [90]

$${\text{y}}\left[ {\text{k}} \right] = \mathop \sum \limits_{{{\text{j}} = 1}}^{{\text{m}}} {\upalpha }_{{\text{j}}} {\text{y}}\left[ {{\text{k}} - {\text{m}}} \right] + {\updelta }\left[ k \right],$$
(4)

where \({{\alpha }}_{\mathrm{j}}\) are the jth coefficients of AR process, and m denotes time delay index.

The power spectrum is given by [90]

$${\text{P}}_{{\text{y}}} \left( {\text{Z}} \right) = {\upsigma }_{{\updelta }}^{2} \left| {\frac{1}{{1 - \mathop \sum \nolimits_{{{\text{j}} = 1}}^{{\text{m}}} {\text{Z}}^{ - 1} {\upalpha }_{{\text{j}}} }}} \right|^{2} .$$
(5)

2.3.3 Spectrogram technique

A spectrogram provides time varying spectral density description of the ECG signal. It shows signal in time–frequency domain. Mathematically, it is given by squared magnitude of the short time Fourier transform (STFT) of the signal as in [91]

$${\text{Spectrogram}}\left( {{\text{t}},{\text{w}}} \right) = \left| {{\text{STFT }}\left( {{\text{t}},{\text{w}}} \right)} \right|^{2} ,$$
(6)

where t denotes time (in sec) and w denotes frequency (in rad/sec).

STFT estimates sinusoidal frequency and phase content of the local segments of a signal as it changes over time. It converts long length signal into small segments and computes Fourier transform of each [92]. Therefore, spectrogram represents time–frequency-intensity spectrum for a short time duration [93, 94].

2.3.4 Classification

After successful completion of preprocessing using SGDF and feature extraction using CWT, spectrogram, AR modeling techniques, the main task that remains is related to their classification, which is a crucial step to detect exact R-peaks in the ECG signal. K-Nearest Neighbor (KNN) classifier is selected here for classification as it yields sufficiently accurate classification results. It works on the assumption that similar things tend to lie near to each other, which can be implemented easily using some functional space equations. Some authors ignored KNN classifier due to its laziness in the past. But in CAD-based system, it is the accuracy that matters for saving the life of a subject (patient) at the time of emergency. And, the system does not require to rely on building a model with KNN and thus is able to yield more versatile responses [10]. This is the reason of emergence of CAD in the field of health informatics.

In KNN, generally Euclidean distance metric (EDM) is preferred [95], because it does not require assigning weights for various features. EDM is measured between the test sample and training sample as

$${\text{EDM}} = \sqrt {\mathop \sum \limits_{{{\text{j}} = 1}}^{{\text{m}}} ({\text{P}}_{{\text{j}}} - {\text{Q}}_{{\text{j}}} )^{2} } ,$$
(7)

where P and Q denote test and training sample, respectively, in class L. The next test sample is measured on the basis of K-Nearest training samples. Most of the time, getting the odd values of K is the prime objective [96]. An appropriate value of K gives low test error rates, but it may enhance the number of iterations. Therefore, the strategy is to use same dataset for both testing and training purposes. K-fold cross validation is also applied for validation of the dataset as in [97]. Figure 3 indicates the steps involved in the KNN classification algorithm.

Fig. 3
figure 3

Steps involved in the KNN classification algorithm [10]

2.3.5 Classification parameters

For evaluating the performance of the proposed methodology, two important parameters viz. sensitivity (SE) and detection rate (D.R) are considered in this paper. The definitions of SE and DR are illustrated below [98] as;

Sensitivity (SE) – It is the ratio of true positive (TP) to the all actual positives (TP + FN). It estimates the proportion of actual positives (TP/TP + FN) which are accurately detected.

Detection Rate (D.R)—It is the ratio of total number of true positive (TP) to the total actual peaks.

Mathematically, these classification parameters are defined as [66, 79, 81, 99,100,101,102,103]

$${\text{Sensitivity }}\left( {{\text{SE}}} \right) = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
(8)
$${\text{Detection Rate}} = \frac{{{\text{Total}} \, {\text{True Positive}}\left( {TP} \right)}}{{{\text{Total Actual Peaks}}}},$$
(9)

(9). where TP denotes true positives, FP denotes false positives, and FN denotes false negatives, which are illustrated below as;

True positives (TP): They are defined as positively detected events when system possess such events actually. For example, if any patient is diagnosed with a disease from which he is actually suffering, then it is called TP.

False positives (FP): They are also known as Type-I error. They are defined as wrongly detected events. For example, if any patient is diagnosed with a disease from which he is not suffering, then it is called FP.

False negatives (FN): They are also known as Type-II error. They are defined as wrongly missed events. For example, if any patient is not diagnosed with a disease from which he is actually suffering then it is called FN. In the existing literature, various techniques have been used as summarized in Table 1 along with their pros and cons.

Table 1 Pros and cons of proposed and existing research work

3 Results and analysis

Any deviations in a signal makes the recognition of the existing patterns a difficult task. The spectrogram can represent these deviations effectively only if the signal has high SNR. Otherwise, both time domain and spectrogram approach fail and phase analysis using CWT works better [94] instead as shown in Fig. 4. It shows two peaks in the left figure and three peaks in the right figure of 103 m dataset. The modulus (magnitude response) of CWT clearly reveals all the actual peaks and the angle (phase response) reveals its characteristics.

Fig. 4
figure 4

CWT analysis of MIT-BIH Arrhythmia database(record no. 103 m)

AR technique strengthens the CAD system by measuring the amount of peak power that is associated with each of the constituent frequency components. It is also known as power spectrum. Any noise that remains after preprocessing is further investigated using the power spectrum. The corresponding coefficients help to figure out the type of heart disease as done in [72, 95]. Figure 5a, b shows power spectrum and AR coefficients for the MIT-BIH Arrhythmia database (record no.103 m) at model order 5. Figure 5c, d shows the power spectrum and AR coefficients for the MIT-BIH arrhythmia database (record no.103 m) at model order 6.

Fig. 5
figure 5

AR Coefficients calculation and power spectrum for the MIT-BIH Arrhythmia database(record no.103 m) at model order- (a) 5, (b) corresponding coefficients, (c) 6, (d) corresponding coefficients

Contour plot is used to describe an ECG signal in terms of their time–frequency analysis for differentiating (i) noisy and filtered ECG datasets and (ii) normal and abnormal ECG datasets. Figure 6 shows the contour plot of MB Ar DB (record no. 103 m) for differentiating noisy and filtered ECG datasets where the vertical scale is frequency measurement, horizontal scale is time and power is indicated by the color intensity [94]. The existing approaches based on power spectrum resulted in wrong outcomes due to their limited time–frequency resolution. There they estimated the same frequency output using different windows of identical size both for normal and heart patients’ ECG datasets. However, the spectrogram technique provides an effective signal estimation, both in the time and frequency segments of the ECG datasets.

Fig. 6
figure 6

Contour Plot of MIT-BIH Arrhythmia database (103 m) using Spectrogram technique; (a) Without baseline wander removal and filtered, (b) After removal of baseline wander and filtered

AR technique has multiresolution capability which can figureout all the actual peaks as well as noise present in the recorded ECG signal. Figure 7 shows detected R-peak in 103 m database at model order 6 using AR technique. Here, R-peaks in the three-dimensional view are obtained with time interval of 1 s, frequency resolution of 20.09 Hz/points. All amplitudes are obtained in decibel (dB) during R-peaks detection using AR modeling technique.

Fig. 7
figure 7

Detected R-peak in 103 m database at model order 6 using AR Technique

In this paper, the proposed technique has obtained SE of 99.90%, D.R of 99.81% and SE of 99.77%, D.R of 99.87% for MB Ar and RT DB, respectively. Table 2 and Table 3 shows analysis results for RT DB and MB Ar DB, respectively. Table 4 clearly reveals that the proposed technique outperforms the existing techniques. In future, these results will definitely help in enhancing the applications of the proposed methodology in expert systems.

Table 2 Performance evaluation of the proposed technique for real-time database
Table 3 Performance evaluation of the proposed technique for MB Ar database
Table 4 Comparison of current and existing techniques on the basis of SE

Table 2 shows that out of 25,426 actual R-peaks, the proposed technique detects 25,406 R-peaks, TP of 25,394, FN of 58, and FP of 35.

Table 3 shows that out of 1,10,043 actual R-peaks, the proposed technique detects 1,09,875 R-peaks with TP of 1,09,833, FN of 109, and FP of 83. In most of the datasets of MB Ar DB, the proposed technique outperforms and yields FN and FP of 0 (MB-124, MB-207, MB-217, MB-219, MB-220, MB-222 for FN = 0 and MB-201, MB-202, MB-205, MB-207, MB-208, MB-210, MB-215, MB-217, MB-223, MB-231 for FP = 0).

The existing techniques have SE of 99.69%, 99.68%, 82.75%, 99.8%, 98.01%, 99.81%, 98.32%, 99.84%, 99.65%, 95.8%, 98.84%, 99.29%, 99.75%, 99.66% in He et.al. [57], Jain et al. [59], Narina et al.[112], Rai et al. [113], Rajesh and Dhuli [70], Saini et al. [97], Van et al. [114], Lin et al.[64], Das and Ari [63], Kumar et al. [30], Kaya et al. [115], Kaya and Pehlivan [116], Pan and Tompkins [117], and Liu et al. [118], respectively. Table 4 shows that the proposed technique (with SE of 99.90%) outperforms all other existing techniques.

The existing techniques have D.R of 99.81%, 99.89%, 99.45%, 99.20%, 99.92%, 99.57%, 99.66%, 99.87%, and FN + FP of 412, 249, 784, 1598, 219, 957, 758, 245 in P. Phukpattaranont et.al.[119], Sharma and Sharma [120], Pan and Tompkins [117], Dohare et al. [121], Manikandan and Soman [122], Nallathambi and Príncipe [123], Pandit et al. [124], and Yazdani and Vesin [125], respectively. It can be observed that several techniques such as [120, 122] and [125] have slightly higher D.R as compared to that obtained with the proposed methodology. But the proposed technique outperforms all other existing techniques on the basis of low false detection rate (FN + FP) with comparable D.R as shown in Table 5.

Table 5 Comparison between current and previous researched work on the basis of total beats, TP, and false detection rate (FN + FP)

The existing techniques have false detection rate (FN + FP) of 586, 479, 372, 594, 459 in Zidelmal et al. [126], Christov [127], Bouaziz et al. [128], Choi et al. [129], and Sahoo et al. [51], respectively. The proposed technique outperforms all other existing techniques on the basis of low false detection rate (FN + FP) as shown in Table 6 where all the datasets of MB Ar DB are considered. It is further concluded that the proposed technique outperforms all other existing techniques for most of the datasets viz. MB-104, MB-105, MB-108, MB-116, MB-200, MB-201, MB-202, MB-203, MB-205, MB-207, MB-208, MB-210, MB-215, MB-217, MB-222, MB-228, and MB-233.

Table 6 Comparison of current and existing techniques on the basis of false detection rate for each dataset

4 Conclusion

This paper successfully analyzed RT and MB Ar DB using CWT, Spectrogram and Autoregressive technique together. It has been demonstrated that the proposed technique outperforms the existing state-of-the-art techniques. The performance of proposed methodology, i.e., SE of 99.90%, D.R of 99.81% (for MB Ar) and SE of 99.77%, D.R of 99.87% (for RT DB) reveals its applications in the emerging medical informatics field in practical emergent situations. It will definitely help in properly classifying different kinds of arrhythmias promptly at an early stage. It has also been shown that the spectrogram gives important frequency analysis that can detect existing arrhythmias. AR technique has yielded good time and frequency resolution simultaneously on the basis of selected features such as PSD, time–frequency analysis (TFA) and model order.

The proposed technique identifies the frequency information quiet clearly, which was shown to be important for filtering-out the high-frequency noise components. The proposed approach promises a ready-to-use methodology in any critical surgery or cardiology lab due to its robustness.