1 Introduction

According to the World Health Organization (WHO), the approximate count of patients deceased because of cardiovascular disease (CVD) is nearly 17.9 million, accounting for close to 31% of all fatalities [1, 2]. CVD includes various underline diseases, such as raised blood pressure (hypertension), coronary heart disease (heart attack), peripheral artery disease, rheumatic heart disease, cerebrovascular disease (stroke), deep vein thrombosis, heart failure, pulmonary embolism and congenital heart disease [3]. Of these diseases, approximately 85% of deaths are caused by stroke and heart attack. As per the WHO’s reports, by 2030, about 23.6 million individuals will die due to CVDs, i.e. primarily from stroke and heart disease [4, 5]. Thus, there is an immense need for continuous monitoring of some essential parameters of the human body, which are critical and should be exhaustively monitored in real-time paradigms.

The enormous growth in the field of Internet of Things (IoT) has facilitated Information Technology (IT) to new heights [6,7,8,9,10]. Rapid development in the empire of IoT-based applications areas makes IoT a rising technology. In the current viewpoints, approximate all the application domains the IoT is getting involved and actively participating in the journey towards a smarter world [11,12,13]. In the healthcare domain, the traditional procedure was being followed by the patients but after the emergence of IoT in healthcare, the e-health or smart health concept has come into the picture [14,15,16,17,18]. Resultant, a variety of smart devices are being developed for enabling services such as remote monitoring of the patients, unleashing patients' healthy and safe, and empowering doctors to verbalize superlative care [19,20,21]. This technological advancement will not only reduce the medical overhead but also enable in-time support of the patients at remote locations [14,15,16,17,18]. It also plays a major role in decreasing the total expenditure by minimizing the span of hospital stay with improved treatment outcomes.

In the classification problem, the data with unbalanced nature is one of the biggest issues, and as far as the healthcare domain is concerned it even became more crucial because the medications are totally dependent upon the classification outcome [22,23,24,25,26]. Therefore, in the healthcare domain, the classification of unbalanced datasets is an emerging area of research. Over time a number of researchers have not only suggested their viewpoints in the form of algorithms and theoretical approaches [27, 28] but also developed various class-balancing solutions in the form of hybrid paradigms [29, 30]. As far as data balancing techniques are concerned, two types of data balancing techniques are being widely used where the first is under-sampling and the other is over-sampling [31, 32]. In the under-sampling approach, the class balancing is done by eliminating the data samples from the majority class, whereas, in the over-sampling approach, the class balancing is done by adding up artificial samples to the minority class.

Individuals' well-being is one of the crucial tasks and it becomes more complex when we are dealing with one of the deadliest diseases, i.e. CVD in real-time scenarios. Consequently, there is a need for algorithmic approaches that would play an essential role in reducing the total risk of CVD through its efficient classification. Keeping these constraints in our mind, we begin the experimental examination with basic classification models that are less accurate and not capable enough to deal with the class imbalance problems. After several trials, we found that the proposed intelligent hybrid classification model is well suited for classifying the imbalanced Electrocardiogram datasets.

The main contributions of the paper are:

  • To establish an IoT-enabled ECG monitoring system for data generation with the help of Node MCU ESP32 and heart rate sensor AD8232.

  • To propose an intelligent hybrid classification model having the capability of handling the complexities of class imbalance with more accurate results.

The characterization of this paper is as follows: Section two presents a short description of current literature based on algorithmic approaches for the classification of ECG Dataset. In section three, a brief discussion of the methods and materials such as dataset generation and description, proposed epistemology and statistical measures have been presented. The statistical measure-based classification results have been shown in section four. The deeper insights into the classification results have been presented in section five. Section six incorporates the closing remarks along with future routes of the work.

2 Related work

Massive growth in the field of Information technology encourages research to explore the dimensions of recent technologies. It also motivates researchers and groups to build a technological solution for human well-being. In a couple of years, various development not only in the algorithmic perspective but also in system design has been seen [33,34,35,36,37]. If we talk about the healthcare domain, a lot of possibilities are still available, which will catalyze the idea of a smart world. Cardiovascular disease (CVD) is a crucial disease among various life-threatening diseases across the globe, it has gotten the attention of researchers to work on and give their contributions to social well-being. From time to time various algorithmic solutions to the ECG dataset have been suggested but there is still plenty of scope for improvements [38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]. The quick insights of the current research on ECG datasets are shown in Table 1.

Table 1 Insights of the Contemporary Research

3 Materials and methods

This section introduces the material and methodology that has been used to carry out the experimental evaluation. This section is divided into five subsections, where, the first subsection refers to the hardware setup for ECG data generation. In the second subsection, the dataset description has been presented. The model setup for the classification task has been discussed in subsection three. In the fourth subsection, the recommended model has been introduced. Statistical measures for the validation of the classification model have been presented in the last subsection five.

3.1 Hardware setup for ECG data generation

In order to generate the ECG data, we made a setup that mainly consists of a node MCU (ESP32) and a heart sensor (AD8232). In Fig. 1a the graphical representation of the hardware setup has been shown, whereas the nine electrode placement (E1—Fourth intercostal space (at the right sternal border), E2—Fourth intercostal space (at the left sternal border), E3—Intermediate between leads E2 and E4, E4—Fifth intercostal space (at the midclavicular line), E5—Left anterior axillary line (as the same horizontal plane of E4), E6—Left mid axillary line (as the same horizontal plane of E4 and E5), E7—Right arm (inner wrist), E8—Left arm (inner wrist), and E9—Right side of stomach) the human body is shown in Fig. 1b.

Fig. 1
figure 1

Hardware setup for ECG data generation (a) Hardware setup (b) Electrode placement

In Table 2, the pin connection among node MCU (ESP32) and heart sensor (AD8232) for the ECG data generation have been shown.

Table 2 Pin connection

The data has been generated in real-time and stored in cloud storage (Ubidots) over a TCP connection with the help of the HTTP POST command. The generated data is transferred in real-time to the cloud storage by using a Wi-Fi connection. The working steps of the hardware setup have been shown in Fig. 2. The functioning of this hardware setup is as follows:

  • First of all, the connection between the heart sensor (AD8232) and node MCU (ESP32) is established.

  • In the second step, the electrode placement to the human body is performed.

  • In the third step, the generated data is visualized on the serial monitor.

  • In the fourth step, this generated data is transferred into cloud storage with the help of the ESP32 Wi-Fi module.

  • In the last step after this ECG data is extracted from the cloud medium to the local machine for performing further investigation

Fig. 2
figure 2

Working steps of the hardware setup

3.2 Dataset description

For the experimental analysis, the ECG data have been used, which is generated through Node MCU (ESP32) and heart rate sensor (AD8232). Nine sensors (E1–E9) are placed at different body locations and their corresponding readings are observed. This exercise has performed on the 50 volunteer participants over a time span of 150 s. For every second, a tuple consisting of nine attributes is generated by the system and uploaded to the server (Ubidots) over a TCP connection with the help of the HTTP POST command. The generated stream of data is transferred in real-time to the cloud storage by using a Wi-Fi connection. This ECG data has been extracted from the cloud to a local/native machine for evaluation purposes. Based on the current health of the volunteer this dataset has been classified into the two-class where class 1 denotes healthy patients and class 2 represents the cardiac ill patient. This dataset is consisting of 1700 instances of 10 attributes. The visualization of the ECG dataset (nine channels with class level) and their co-relation are presented in respective Fig. 3a, b.

Fig. 3
figure 3

Dataset description (a) Visualization of the ECG dataset (b) Co-relation coefficient matrix

The class-based partitioning of the ECG dataset over nine attributes is shown in Table 3, which consists of the attribute’s illustration with the help of range (min and max), means, and standard deviation.

Table 3 Class-based distribution of the ECG dataset

3.3 Model setup

The classification model setup for the experimental analysis of the ECG dataset has been shown in Fig. 4. This setup is comprised of five essential steps. In step one, the ECG data is given as input to the model. In step two, the data preprocessing for the exclusion of unusual objects and missing values has been performed. Step three is consisting of the classification task where the processed data is given out as an input to the classification algorithms (i.e. K-Nearest Neighbor (KNN), support vector machine (SVM), random forest (RF), Adaboost (ADB), and Bagging (BAG)). Performance estimation of the classification algorithm is measured in step four and based on these classification results the identification of the best classification model is identified in step five. All the experimental evaluation has been executed using various evaluation criteria, i.e. 2, 3, 5, and 10-fold on a Dell workstation with a 64-bit Intel Xeon processor running at 3.60 GHz and 32 GB of RAM. Python has been used to implement each of the algorithms being used in the simulation.

Fig. 4
figure 4

Classification model setup

3.4 Proposed hybrid classification model

The workflow of the recommended hybrid model is presented in Fig. 5. The recommended hybrid classification model is composed of several steps are:

  • Step I The raw data is given out as input to the recommended model.

  • Step II The pre-processing task on the raw ECG dataset is performed to eliminate the missing values and unusual objects from the dataset.

  • Step III Class balancing has been achieved using SMOTE (Synthetic Minority Oversampling Technique) and which gives a new balanced dataset as output.

  • Step IV This new balanced dataset has been given out as an input to the hyper-tuned random forest algorithms under the various evaluation criteria, i.e. 2, 3, 5, and 10-fold.

  • Step V The statistical parameters (i.e., accuracy, recall, precision, and f1-score) based on performance evaluation on the hybrid classification model have been performed.

Fig. 5
figure 5

Work-flow of the proposed hybrid model

3.4.1 Class balancing using SMOTE

Class balancing is one of the critical matters which should be effectively handled while making the classification. Suppose, we have a binary classification problem where one class holds the majority of samples and the other one has very few data samples. Thus, making the classification based on imbalanced data may give biased results toward the majority class because while making the classification model the majority class contribution will be more as compared to the minority class. Resultantly, the correctness of the classification model will be sacrificed. Therefore, in dealing with the class imbalance problem we have used a SMOTE algorithm which was introduced by Chawla et al. in the year 2002 [64, 65]. The basic principle of this algorithm is to make the class balance by generating artificial samples in the minority class. It uses the k-nearest neighbors (NNs) concept to generate random synthetic samples. The SMOTE-based class balancing result has been shown in Table 4, which contains class-wise distribution with the various SMOTE percentage (i.e. 0, 50, 150, 250, 350, 450, 550, and 650).

Table 4 SMOTE based class balancing result

The pseudocode of the SMOTE algorithm to solve the class imbalance issue of the ECG dataset is represented in Algorithm 1.

figure a

3.4.2 Hyper-tuned random forest algorithm

The Random forest (RF) algorithm is among the extensively used classification algorithms [66, 67]. Due to its extensive nature, it can be applicable in roughly all application areas. The reason for picking up this algorithm in classification is its extensive coverage and well-established nature. The best parameter for this classification algorithm is achieved by the hyper-tuning selection criteria. The best hyperparameter is used in the recommended hybrid paradigms. The pseudocode of the hyper-tuned random forest model for the classification of the ECG dataset has been represented in Algorithm 2.

figure b

The classification hyperparameters (i.e., min_samples_split, n_estimators, max_features, min_samples_leaf, bootstrap, max_depth) with the various selection criteria and best hyper-parameter settings used for tuning purposes have been presented in Table 5.

Table 5 Classification's hyperparameters

3.5 Statistical analysis

For the validation of the classification results, four statistical measures, i.e., accuracy, f1-score, precision, and recall have been used. These statistical measures play an essential role in establishing the accurateness and suitableness of the classification model [68]. Statistical measures with their respective mathematical formulation have been shown in Table 6.

Table 6 Statistical measures

4 Result

An accurate model identification in the IoT-enabled smart healthcare environment is among the arduous but innovative tasks. The work primarily aims to create an intelligent hybrid classification model which is proficient in dealing with the class imbalance issue with greater exactness and will play a key role in building the robotics solution for communal well-being. Results are obtained by comparison of five state-of-the-art models namely, ADB, BAG, RF, KNN, and SVM with the proposed model which is also shown in Fig. 6.

Fig. 6
figure 6

Classification models a quick look

To find out the effectiveness of the recommended model, a deep assessment among the five state-of-the-art models under the various evaluation criteria (2, 3, 5, and 10-fold) has been conducted. The validation of the classification results is calculated using four performance measures (namely, accuracy, recall, f1-score, and precision).

The dataset used has two classes where class 1 represents healthy patients and class 2 represents cardiac ill patients. From the empirical evaluation, it is clear that the recommended hybrid model obtained the top accuracy throughout the experiment under various validation measures over the other well-established classification models. The statistical measures-based experimental result is shown in Table 7.

Table 7 Statistical measures based evaluation result

5 Discussion

For the experimental analysis, the ECG data have been used, which has been generated through the heart rate sensor (AD8232) and Node MCU (ESP32). To perform the evaluation this ECG data is been transferred from the cloud to the local machine. This dataset is classified into the two-class, where class 1 denotes healthy patients and class 2 represents the cardiac ill patient. The paper presents a comparison of five state-of-the-art models namely, ADB, BAG, RF, KNN, and SVM with the proposed model. Evaluation is performed against four statistical metrics namely, accuracy, precision, recall, and f1-score. The class-wise visualization of classification results with the help of four statistical measures under various validation criteria using cross-validation policy having 2, 3, 5, and 10-fold is shown in Figs. 7a, b, c, and 8a, b, c, respectively. Figure 9 presents the average accuracy of models during the experimental period.

Fig. 7
figure 7

Classification result of class 1 (a) f1-Score (b) Recall (c) Precision

Fig. 8
figure 8

Classification result of class 2 (a) f1-Score (b) Recall (c) Precision

Fig. 9
figure 9

Results of classification models

The empirical evaluation shows that the recommended hybrid model is proficient to handle the complexities of class imbalance in the ECG dataset with enhanced performance for both classes, which will give support in building the IoT-enabled smart and accurate healthcare system. A comparison among state-of-the-art algorithms and recommended hybrid models has been carried out to establish the accurateness and suitableness of our recommended model. The recommended model attains the highest accuracy of 99.7% under different validation criteria among all the state-of-the-art algorithms, i.e. Adaboost (91.88%), Bagging (92.40%), random forest (92.48%), K-Nearest Neighbor (92.38%), and support vector machine (91.98%). The recommended hybrid model not only handles the complexities of class imbalance for electrocardiogram datasets but will also help in building intelligent and accurate IoT-enabled healthcare systems.

The dataset has been generated by 50 volunteer participants which are suitable for binary classification problems and are not suitable to cover all types of heart diseases (i.e. for multiclass classification problems). Therefore, in the future this work will be expanded from the data (for adding more feasible attributes) and algorithmic point of view. We will also try to make this problem a multiclass classification problem by generating data related to different types of Cardiovascular diseases.

6 Conclusion

Cardiovascular diseases (CVD) are one of the biggest hazards to human society across the globe. Hence, there is an immense requirement for real-time observation and analysis of cardiac health. Identification of the correct model in IoT-enabled smart healthcare paradigms is an arduous but innovative task. IoT-enabled intelligent healthcare systems include numerous applications like Blood Pressure (BP) check, Heart Rate (HR) monitoring, Electrocardiography (ECG) observation, etc. This paper recommends an IoT-enabled ECG monitoring system for data generation (with the help of Node MCU ESP32 and heart rate sensor AD8232) and an intelligent hybrid classification model. The key intention of this study is to give a smart hybrid classification model for dealing with class imbalance problem with greater exactness and which will play a key role in building the robotics solution for communal well-being. The dataset used has two classes where class 1 represents healthy patients and class 2 represents cardiac ill patients. A rigorous comparison based on various evaluation criteria (2, 3, 5, and 10-fold) among state-of-the-art algorithms and recommended hybrid models have been carried out to establish the accurateness and suitableness of our recommended model. The recommended model attains the highest accuracy of 99.7% throughout the experiment under different validation criteria among all the state-of-the-art algorithms, i.e. Adaboost (91.88%), Bagging (92.40%), random forest (92.48%), K-Nearest Neighbor (92.38%), and SVM (91.98%). The recommended hybrid model not only handles the complexities of class imbalance for electrocardiogram datasets but will also help in building intelligent and accurate IoT-enabled healthcare systems. Thus, accurate classification of cardiovascular health through our recommended model would be useful for improving the lifestyle of cardiac patients. This will not only allow patients to be treated from the comfort of their homes but will also reduce the need for hospital visits and reduce the overall expenditure on hospital visits. Furthermore, it would also help in enhancing the capabilities of effective emergency response to any medical emergency.

In the future, this work will be expanded from the data and algorithmic point of view. We will also try to make this problem a multiclass classification problem by generating data related to different types of cardiovascular diseases. Thus, we can not only detect different types of heart diseases but also classify them correctly. After this, we will try to build wearable devices in the form of a band or chest belt or undergarment which will be a complete cloud-based framework.