Introduction

Diabetes is a common chronic metabolic disease, characterized by prolonged hyperglycemia that finally causes alteration in microvasculature, blindness, renal pathologies and diabetic neuropathies [1, 2]. It has also suggested affecting arteries that supply blood to the lower extremities, brain and heart. This exposes the individual to higher risk of limb amputation, stroke as well as myocardial infarction [2]. Diabetic neuropathy manifests as poly/mononeuropathy and/or autonomic neuropathy [3]. It also features neurological degeneration affecting the autonomic nervous system (ANS) and indicates increased risk of cardiovascular events [4, 5].

The diabetes has also suggested resulting in autonomic dysfunction, which may alter cardiac electrical activities field leads to variety of cardiac disorders [3]. The QT segment of electrocardiogram (ECG) is identified as most sensitive to these anomalies [6,7,8] but fails in diagnosing many of the events. Thus, ECG derived heart rate variability (HRV) analysis has been tried to extract in-depth information on cardiac condition. The HRV is now becoming a powerful tool in evaluation of sympatho-vagal balance and is analyzed with variation in time intervals between adjacent R waves of long-term ECG records [9]. Review of literature suggests HRV in investigation of diabetic induced neuropathy, cardiac vascular disorder and other disorders of varied origin [5, 9, 10]. Further, HRV analysis has been tested in prediction of autonomic neuropathy, cancer staging, mental disorders and fetal heart rate monitoring [11,12,13,14,15].

Since last decade, ECG based non-invasive measures including heart rate turbulence and HRV analyses have suggested being markers of ANS modulation [9]. The ANS dysfunction has also suggested occurring early in diabetic condition and now days considered important for cardiac risk analysis [3, 4]. Thus, HRV parameters can be used for important features in developing an ECG based non-invasive automated prognostic system to segregate the diabetic from normal subjects. Among various computational techniques, the artificial neural network (ANNs) has demonstrated an accuracy ranging from 77 to 98% using different pathophysiological and demographic features [16,17,18,19,20,21]. However, all such approaches cannot be developed as an online diagnostic system because of time delay in obtaining pathological findings. Further, review of literature reveals an accuracy ranging from 78 to 92% with K-nearest neighbor (KNN), 80% to 90.5% with ANN, SVM, adaboost and probabilistic neural network using HRV features [22, 23]. SVM as classifier has been demonstrated in pattern recognition research and successfully applied in classification of normal data from disease data from kidney disease, Alzheimer’s, lung disease, gene expression, mammography mass detection, ECG analysis and arrythmia detection [24,25,26,27,28,29,30,31]. SVM is preferred with its capability to draw hyperplane into higher dimensional space that divides the test samples into two classes using kernel function [29,30,31]. Further, the SVM performs faster as compared to other classifier techniques and yield with good accuracy [26].

The HRV parameters can be computed online from real time ECG records and attributed to the input layer of trained ANN and SVM for automated evaluation. Owing to the successful application of HRV parameters in identification of different clinical disorders and application of ANN and SVM in clinical diagnosis, in present work, an automated diabetic prediction system has been hypothesized based on time domain HRV features.

Materials and methods

Subjects and induction of diabetes

The Wistar rats (n = 10, age = 10–12 weeks and weight = 200 ± 20 gm) are recruited for the experiment. Rats are individually housed in the animal house with controlled environment condition (temperature (24 ± 1 °C) and 12 h of light cycle (6 AM to 6 PM)) with fat rich food and water ad libitum. Subjects are equally divided in Control and Diabetic groups. The Streptozotocin (STZ) drug (35 mg/kg prepared in citrate buffer with pH of 4.5) has been used to induce diabetes. The subjects in control group are treated with normal saline water. The diabetes has been confirmed with analysis of body weight and blood glucose level before the ECG recording. The experimental procedure is approved by Institutional Animal Ethics Committee (IAEC) of Birla Institute of Technology, Mesra, Ranchi, India.

ECG recording and preprocessing

Ten minutes of lead-I ECG is recorded digitally at 200samples/sec from each subject. The signal acquisition is performed on seventh day of experiment using MP45 bioamplifier with Biopac student lab 4.0 software (Biopac Systems Inc., USA). The Ketamine (91 mg/kg body weight) and Xylazine (9.1 mg/kg body weight) are used to anesthetize rats. The stainless-steel differential electrodes are placed as per Einthoven’s triangle rule for the lead-I ECG. The movement artefacts (if any) are removed using 2 Hz high pass filter before HRV analysis.

Heart rate variability analysis

The RR intervals are considered for calculating the tachogram using Biopac Student Lab 4.0 software. The software based calculations are performed with the setting of rat tachogram calculation with heart rate window from 100 to 500 bpm. The frequency bands such as low frequency (LF), high frequency (HF), very low frequency (VLF) are set to 0.0–0.15 Hz, 0.15–1.5 Hz and 1.5–5.0 Hz, respectively [32]. On the same line, the Kubios 2.0 software (University of Eastern Finland, Finland) has been optimized with the setting for rat HRV calculation. The calibrated software is used in extracting the information for HRV in time domain [12]. The different HRV parameters used are listed and elaborated in Table 1.

Table 1 Description of HRV time domain parameters

Backpropagation ANN model for diabetes identification

The backpropagation neural network is developed and implemented with Anaconda (version-2019.03) Distribution Python (version-Python 3.6.8: Anaconda, Inc., USA) programming language. A three-layered network model, consisting of input layer (IL) with 9 nodes, hidden layer (HL) and output layer (OL) with one node is shown in Fig. 1. This model is used to distinguish diabetic and control subjects based on HRV time domain features. The numerical value of HRV features are obtained automatically using Kubios software and used as initial weight to the IL. The single HL model has been chosen as of universal approximator and classifier [33]. However, the conflicting reports have been presented on number of hidden nodes [34].

Fig. 1
figure 1

Flowchart for heart rate variability feature selection and testing of artificial neural network (9:5:1)

Training and testing of ANN

The model training is done with numerical values of nine time domain features. The activation function of rectified linear unit has been presented at HL nodes. The output layer is activated with sigmoid function for the probalistic analysis of target values. During training, the ANN has been activated to attain its assigned value with error is reduced upto 10,000 iterations. The ‘Adam’ optimizer is used to mutate the weights by backpropagation [35, 36]. The number of epochs, batch size and learning rate (LR) parameters are assigned to the network. The numerical values of time domain HRV parameters have been used as input to the ANN model while testing. The model output has predicted ‘diabetic’ if output response is > 0.5 else ‘Control’. The complete procedure for testing HRV parameters with ANN is presented in Fig. 1.

The performance of ANN model is observed with 526 datasets of nine-time HRV parameters. The fivefold training method has been selected to optimize the model. The testing protocol is aimed at segregating the diabetic and control subjects based on nine-time domain HRV parameters. Further, 118 datasets entirely different from the 526-dataset used in training, have been selected to test the model performance on new datasets.

The classification accuracy is calculated as [28]:

$$\% classification\;accuracy = \frac{Number\;of\;epochs\;identified\;correctly}{{Total\;number\;of\;epochs\;tested}} \times 100$$

Support vector machine

The SVM algorithm is programmed in Python to classify the datasets into diabetic vs control conditions. The SVM worked as linear separator with optimized hyperplane into a higher dimensional space (maximization of margin between the two classes) using kernel function. SVM is also an effective method in pattern recognition problems in bioinformatics [29,30,31]. The optimized hyperplane has been selected with penalty parameter C > 0 and radial basis kernel (RBF) (γ > 0).

Training and testing protocol of SVM

The total of 420 dataset for training and 106 datasets for testing with HRV parameters has been presented as input to SVM. The Algorithm is designed to distinguish between the control and diabetic subjects based on time-domain HRV parameters. The RBF kernel is used with parameter C varying from 0.001 to 10 and the parameter γ from 0.001 to 1.

Statistical analysis

The changes in HRV parameters under diabetes in comparison to control subjects has been analyzed using on-tail independent student t-test developed on MS-Excel 2013. The significant difference between samples is tested at 95% significance level. Further, the collinearity is also tested among the time domain parameters using regression analysis and calculation of variance inflation factor (VIF) value. VIF value predicts the correlation between one predictor to another and used to check collinearity between the predictors.

Results

Before the recording of ECG and analysis of changes in HRV parameters, the diabetic conditions have been confirmed by monitoring the body weight and blood glucose level of rats. The blood is extracted from the tail of rats on the first as well as seventh day of experiment and tested with the help of clinical glucometer. The observations suggest significant reduction in the weight of the STZ injected rats (P < 0.05) with high increase (P < 0.05) in blood glucose level, that confirms the diabetes in STZ induced rats (Fig. 2).

Fig. 2
figure 2

Comparative evaluation of change in body weight and blood glucose level due to Streptozotocin (STZ) on day 1 and 7. The data are presented in mean ± S.E. and statistically evaluated for significance of changes (‘*’ represents P < 0.05) with respect to control subjects

Analysis of changes in HRV parameters

The 334 and 192 number of noise free epochs (60 s ECG record) have been extracted from control and diabetic groups for plotting tachogram. In order to optimize the R-wave detection, the R-wave threshold level is checked from 0.5 to 0.8 and maximum R-peaks have been obtained at 0.7 threshold level in plotting the tachogram. Hence, in this experiment, all the calculations for HRV analysis are performed on tachogram plotted from the ECG with threshold 0.7. As time-domain HRV parameters are linear in nature and easy to compute, therefore, parameters have been selected to understand the significant differences between the control and diabetic subjects. In the present work, the parameters indicate significant reduction in values of mHR (reverse in mRR), SDHR, rMMSD, NN50, pNN50 and TI (P < 0.05 in all cases) (Table 2). Further, the present dataset revealed medium collinearity between the mHR and TI parameters with VIF value of 2.20. While TiNN parameter demonstrated moderate collinearity with SDNN, rMMSD and NN50 parameters with VIF value of 3.07, 6.35 and 2.11, respectively. Further, NN50 is found showing moderate collinearity with SDNN and rMMSD with VIF value of 2.63 and 3.64, respectively. Near to high correlation has been observed between SDNN and rMMSD HRV parameter with VIF value of 8.22.

Table 2 Analyses of changes in HRV parameters for drug induced diabetic with respect to the control group of rats

These parameters obtained from individual HRV calculation are considered as input feature for the training and testing of neural network model. In order to minimize the complexity in preparation of training and testing datasets, the results of frequency domain and nonlinear HRV analysis have not been used/tested in this work.

Diabetes classification using nine time-domain HRV parameters

The ANN architecture is tested for optimization with different LR parameter (0.001 to 0.9) at HL neurons of ten (Table 3) and varied number of HL neurons (2 to 30) at LR of 0.02 (Table 4). The best architecture identified as 9:5:1 on LR of 0.02 with percentage accuracy observed is 96.2 after 10,000 iterations. The percentage accuracy is fivefold cross validated that show variations from 86.8 to 96.2 with an average of 90.7. Though, the best percent accuracy of 96.2% is achieved with 94.3% and 100% correct identification of control and diabetic epochs, respectively. Further, the model is again tested with the HRV time domain features unknown to the network (blind test data) and recorded from different set of subjects, which are not the part of dataset of this experiment. Even this blind test resulted with good classification accuracy of 74.5% in differentiating diabetic and control subjects (Table 5). The results are also compared with the SVM. Using SVM, the best accuracy of 95.2% has been observed with the time domain HRV parameters. This maximum accuracy is achieved at C = 10 and γ = 0.8.

Table 3 Effect of learning rate parameter on the training of neural networks at different iterations where, ‘LR’ is learning rate
Table 4 Effect of varying hidden layer neurons on the training of neural network at different learning rate parameter for varied iterations
Table 5 Percentage accuracy achieved with optimized neural network architecture (9:5:1) with five-fold cross validation and blind data set

HRV classification for diabetes using significant parameters:

The ANN architecture is optimized with varied LR from 0.001 to 0.9 at 2 HL nodes (Table 3). The LR value of 0.002 at 97.1% accuracy is obtained and used to optimize the number of HL nodes (2 to 30) at 10,000 iterations. The best result has been observed at 25 number of HL nodes (Table 4). The best architecture identified as 7:25:1 on LR of 0.005 with accuracy observed is 98.1%, iterated at 1000 epoch. Further, at 10,000 iterations, the higher accuracy observed is 97.1% with LR of 0.002. The accuracy of 97.1% is also observed with 100,000 epochs with 15 hidden nodes at 0.001 LR. The percentage accuracy is fivefold cross validated, which presented variation from 91.4 to 97.1 with an average of 94.6% at 10,000 epochs. The result on blind dataset demonstrated an accuracy of 97.9% (Table 5). Furthermore, the SVM presented an accuracy of 94.3% with C = 1 and γ = 0.6.

Discussion

In this research work, the HRV time domain parameters are analyzed before assigning them to the input layer of ANN to differentiate the diabetic subjects from a non-diabetic one. Most of the analyzed HRV parameters are identified significantly changed (mHR, mRR, SDHR, rMMSD, NN50, pNN50 and TI) on its statistical test. Only two parameters did not show any change (SDNN &TiNN). The statistical test on these parameters confirms the changes in ECG and the cardiac functions due to drug induced diabetes and hence, also justified the input selection for the ANN. Overall, the accuracy of 97.1% using seven significant HRV parameters as input to the ANN, that is slightly higher than 96.2% obtained with nine HRV parameters. The observed findings revealed almost similar classification accuracies with nine and seven HRV parameters.

The present work shows reduction in body weight with significantly high increase in blood glucose level along with lower heart rate, confirms the STZ induced diabetic condition. These results are in line with published findings demonstrated lower basal heart rate with higher blood glucose level [13, 37]. The obtained results on HRV analysis has suggested reduced HRV in diabetic subjects with lower values of time domain parameters except mRR. These findings are also in accordance with the recent published results [38]. The reduced HRV have already been suggested in prediction of morbidity and mortality with heart [32]. Similar to the observation of present work, pNN HRV parameter have demonstrated in assessing the parasympathetic modulation in balancing the cardiac rhythmicity. The heart rate regulation has suggested to be monitored through ANS that modifies the automatic sinus activity with push–pull activity of sympathetic and parasympathetic nervous system [32]. The decreased activity of sympathetic and parasympathetic activity is suggested to occur due to suppressed glucose along with enhanced fatty acid metabolism that may have exhibited with lower mHR values [37, 38]. Review of literature also suggested cardiovascular disorders and diabetic neuropathy as the long-term consequence of diabetic condition with altered ECG waveform [37, 38]. The HRV analysis has suggested important in detection of ANS dysfunction induced rhythm modifications [32, 39].

It is well understood that the medical data are more or less nonlinear and many internal as well external parameters influence its variability. Researchers suggest that on the random datasets the classification accuracies attained by both ANN and SVM classifiers are almost similar. However, the classification accuracies of these classifiers are also depending on the nature of datasets. The key attributes of the input vectors that influences the most is the balancing in the training sets and the degree of nonlinearity of the signals. It is observed that the classification accuracy of ANN is better when training sets are balanced and the variations in the datasets are less [40]. Otherwise the ANN will be confused and give abrupt results, such as high percentage of false positive as well as false negative. Further, ANN can learn and model nonlinear variations in input dataset. This makes the model more robust. On the other hand, in this condition the SVM gives better classification accuracy with imbalance dataset.SVM model used support vectors. Thus, there is no necessity of sampling the dataset. Further, the computational cost for SVM model is very less as compare to other models. In the present diagnostic problem, where the uncertainties and nonlinearities in the data are unpredictable, it is suggested that both the classifiers should be tested and compared to attain better output.

The lower HRV may be inferred with observed HRV parameters and suggested in prediction of morbidity and mortality with heart diseases [32]. Thus, the time domain HRV parameters are considered to assign as input weights for designing an ANN based computerized detection system for diabetes. Various architectures of ANN have been tested to achieve the optimized and maximum classification output. The network has been tested by varying the number of HL neurons and the LR parameter. In most of the time, the network is saturated at near about 10,000 iterations during its training. However, it is further trained upto 100,000 iterations to minimize errors in the network. In the present work, the highest classification accuracy of 97.1% and 96.2% is attained with the architecture of 7:25:1 with LR of 0.002 and 9:5:1 at LR of 0.02, respectively. It is interesting observation that in most of the time highest accuracy is achieved nearly on 10,000 iterations. While this suggests that the network is hyper-saturated when it is trained for higher iterations and thus, the classification accuracy is getting reduced. The obtained results are more or less on the higher side of previously reported results (77–98%) on the automated detection systems for diabetic prediction.

The pathophysiological parameters are taken and tested as input attributes in previously suggested computer based diabetic diagnostic system [16,17,18,19,20]. However, such systems have always been invasive in nature and involve delay in obtaining the attributes. These attributes are assigned manually and hence it cannot be considered as an online diagnostic system. While, in current work, the time domain HRV parameters are tested and optimized in prediction of diabetic condition. This work proposed the non-invasive method in prognosis of diabetes that can also be developed in online system. Though, in this system, the inputs are not taken automatically but it is proposed that the time domain HRV parameters can be calculated from real time ECG records and attributed automatically on the ANN for accurate prediction.

Conclusion

Now days the scientific world is tending towards development of automated system with the help of artificial intelligence. Healthcare cannot be untouched with this technological shift. Instead of manual, machine-based diagnosis, the clinicians have to think about the more accurate and cost effective online/realtime prediction system. To match with the worldwide technological development and its applications in healthcare, in the present work, a successful implementation of HRV based computational classifiers have been demonstrated in prediction of diabetic condition. The ANN and SVM classifiers are tested to classify the HRV features calculated from the diabetic as well as from normal rats. Results obtained in this work is are very promising (96.2% and 95.2% with ANN and SVM, respectively), which supports the hypothesis that a non-invasive, ECG based online prognostic system for diabetes can be developed. However, the crucial role of expert clinicians cannot be neglected in healthcare.

Future scope

HRV is dependent on the RR intervals or tachogram, which is following a common calculation methodology for all the mammals. The major parameter, which influences the HRV calculation and analysis strategy, is the heart rate of the subject that is significantly correlated with the size and the body weight. Larger the animal, lower will be the heart rate and hence the rat’s heart rate is much higher than the human heart rate. However, with few adjustment and alignments in the processing of HRV data, the features for the classifiers can also be evaluated for human subjects and hence, the proposed system can be tested and used for the humans in future.