1 Introduction

The pervasiveness of diabetic patients has become double from 2010 globally [1]. Diabetes prevalence in the world was noted to be 9.3 % (463 million) in 2019 and it is expected to grow to 10.2 % (578 million) by the year 2030 and to 10.9 % (700 million) by the year 2045 [2]. Diabetes is a chronic disease which is caused when the pancreas stops producing insulin or the body is not able to utilize the insulin produced by the pancreas [3]. Insulin is the hormone which helps the body cells to absorb glucose from the blood. In diabetes, the glucose level in the blood increases. Diabetes can be divided in three parts mainly which includes type-1, type-2 and gestational diabetes [4]. Negligence in the treatment of diabetes can cause some serious health issues. It may cause strokes, nerve damages, heart diseases, kidney diseases and blindness. Diabetes could be administered by doing physical exercises, maintaining a proper diet and using proper dosage of insulin.

Intake of oral medications can also be helpful for controlling during early stages of diabetes. In most cases of adults, 5 % Type 1 diabetic patients have been considered almost in all diagnosed case while 90-95 % Type 2 diabetic patients have been considered for treatment. Hence, it is necessary to develop the device to measure blood glucose for quick and continual diagnosis of diabetes [5]. Diabetes requires continuous glucose level monitoring. If the existing invasive methods are used for this purpose, repeated finger pricking increases the risk of blood-related infections and trauma [6]. Hence, development of non-invasive real time devices becomes very essential. There are very few such devices available in the commercial market right now and those which are available are very expensive [7, 8].

Coronavirus Disease 2019 (COVID-19) has been proclaimed as a pandemic by the World Health Organization (WHO) on January 30, 2020. At this time, 23 million people are infected by corona virus globally and the global mortality rate is 3.4 % [9]. Past viral pandemics have also seen the connection of diabetes to higher morbidness and death rate. During 2002-2003 upsurge of Severe Acute Respiratory Syndrome (SARS-CoV-1), diabetes was considered as an independent risk factor and demise. Likewise, in 2009 during outbreak of Influenza A (H1N1), risk of hospitalization was tripled and risk of emergency admission was quadrupled because of diabetes [10]. Diabetes was pervasive in about 50 % of population at the time of upsurge of Middle East Respiratory Syndrome Coronavirus (MERS-CoV) and in comparison to entire population, diabetic patients chances proportion of having basic or serious MERS-CoV ranged from 7.2 to 15.7. Death rate in patients, who had MERS with diabetes was 35 % [11]. However, higher risk of developing COVID-19 or a more severe form of this disease belongs to several risk groups specially having hypertension, chronic respiratory diseases, cardiovascular diseases (CVD), Diabetes Mellitus (DM) or Metabolic Syndrome (MS) seem to play vital role. Hence, continuous blood glucose monitoring becomes very essential for measurement of diabetes for COVID-19 patients.

As the population is increasing day by day and resources are limited, the requirement of implementation of smart cities is also increasing. Smart healthcare is a core component of a smart city [12]. Many information technologies for example, internet of things (IoT), cloud computing, big data, artificial intelligence are utilized by smart healthcare to increase the efficiency of healthcare and make it more user-friendly [13]. We need smart healthcare solution to detection the diabetes in non-invasive manner for rural areas and remote locations as instant medical facilities are not much available there [14]. Smart healthcare for diabetes is represented with Fig. 1.

Fig. 1
figure 1

Smart healthcare for Diabetes

The significant difficulties associated with the advancement of genuinely non-intrusive glucose observing innovations are identified with accuracy, ease of use and applicability for home use among an assortment of individuals. Only such a device that overcomes the mix of these difficulties may give a noteworthy achievement in this field and improve the personal satisfaction of a huge number of individuals with diabetes around the world [15]. It is very challenging as it obviates frequent visit to patient’s bedside, particularly if the patient is serious and receiving intravenous insulin which is not safe due to corona. But if the patient is not critically ill, then we can give a non-invasive device which is easy to operate so that he/she able to self monitor the blood glucose for diabetes management [16]. The mHealth application would also allow the glucose reading can be sent to remotely located the doctor. Therefore the idea is to develop the continuous glucose monitoring device which is convenient, affordable, harmless and provides real time measurement. This can be done by using a non-invasive device and applying intelligent post processing algorithms on the data collected by this device and integrating it with a microcontroller programmed with best suited machine learning algorithm to give high accuracy and fast results.

In the past, many works have already been reported for the glucose measurement. They can be invasive, non-invasive, or minimally invasive. There has been several attempted for continuous glucose monitoring based on the non-invasive technique. They are technically based on optical and non-optical methods. Some of the optical techniques used methods based on Raman Spectroscopy, NIR spectroscopy, PPG method etc. Fig. 2 gives an overview of measurement techniques for blood glucose measurement systems. After the acquisition of data from sensors, many researchers concentrated to develop the optimized computing model to predict the glucose level precisely. In this way, Sejdinović et al. demonstrated the development of artificial neural network for classification of prediabetic and type 2 diadetic patients [17]. The testing of developed model has been performed using certain ratio of samples. Alić et al. presented the developed expert system for the classification of metabolic syndrome (MetS) [18]. A feed-forward artificial neural network (ANN) is presented for MetS classification. A different artificial neural network is also implemented for lactose intolerance prediction [19]. The physiological behaviors model is elaborated for glucose-insulin regulatory mechanism [20]. Some neural networks of different computing model are also introduced for other medical perspectives such as segregation of cancer and normal patients [21]. A lot of simpler and sophisticated models have been introduced for computing apart. Still, it is required to have optimized model for fast and precise computations.

Fig. 2
figure 2

An overview of Blood Glucose measurement techniques

The report is organized in the following manner. Section 2 covers the literature review. The novel contribution of this work is represented in Sect. 3. Section 4 presents the proposed methodology which comprises of the process of data acquisition, data preprocessing, brief theory of all the machine learning algorithms that have been applied as classifiers or regressors and the performance evaluation is carried out. The proposed method has been compared with state of art work in the literature. Simulation results and comparative analysis of all the applied machine learning algorithms through tables, charts and graphs have been presented in Sect. 5.

PPG is one of the non-invasive measurement technique. The sensor is used which is similar as working principle of pulse oximeter and helps to record PPG signal. Paul et al. [12] developed PPG based blood glucose monitoring with help of pulse oximeter. The light was sent through transmitter and the prediction of glucose was made as per light intensity at the receiver. The change in voltage values was observed as per glucose concentration. In similar manner, Monte-Moreno [22] designed PPG based sensor to extract the information from PPG and the blood glucose was estimated using machine learning models. The continuous glucose monitoring was attempted by using wearable micro-system based minimally invasive approach [23]. It was first wearable glucose measurement device which was used to extract the glucose from the human skin. Optical Coherence Tomography (OCT) is non-invasive approach which helped to have glucose estimation as per OCT slope [24]. The non-invasive approach with Raman spectroscopy was explored which is based on chemical process and the interaction with molecules [25]. The glucose estimation using saliva was attempted using non-invasive method [26, 27]. Ramashyamam et al. [28] suggested NIR spectroscopy based glucose estimation using PPG. The specific wavelength of 935 nm,950 nm and 1070 nm was used for blood glucose prediction with FPGA using Artificial Neural Network. The blood glucose measurement was investigated with painless approach on micro-controller [29]. The diabetes management system with insulin pump was explored for better healthcare to maintain the glycemic profile. The pulsed laser diodes were introduced to collect the photo acoustics signals which are subsequently used for glucose estimation [30].

The intelligent Glucometer iGLU has been developed with optical method using machine learning models. The device was integrated with Internet of Medical Things (IoMT) framework to store the data for the remote monitoring purpose [14]. The device was based on NIR spectroscopy with three channel data was collected which subsequently processed through regression models for the glucose measurement. There are several solution have been developed till date but most of them suffer from accuracy. The PPG approach was used to measure the light intensity variation as per blood volume. In PPG, the light intensity may vary according to blood volume change in the body hence it may not provide accurate glucose value. The wearable microstrip solution is larger in size so it may not convenient for continuous glucose measurement. The OCT based technique would take considerable amount of time for glucose concentration estimation and also results in low specificity and low sensitivity. The Raman spectroscopy based solution requires larger space and portability would be always the issue. The saliva based glucose detection may not be much advisable because saliva sample always vary among the people. LASER based solution may not applicable for frequent glucose measurement. Therefore, the short NIR spectroscopy is considered as the best approach for continuous glucose measurement and it also mitigate all the previous drawbacks. The general flow of the model is represented in Fig. 3.

Fig. 3
figure 3

General Flow of Model

2 Research challenges and novel contribution

The noninvasive measurement of blood glucose helps to have the continuous glucose measurement in smart healthcare system. The accurate value of glucose prediction is really the challenging task in non-invasive measurement. Presently, the available solutions are costly and requires complex mathematical models in order to process for the instant diagnosis. The following research challenges are being addressed in the present manuscript: (1) The best non-invasive solution is defined for the precise glucose measurement. (2) The efficient machine learning models are used for the measurement of blood glucose. (3) The smart healthcare solution using IoMT framework to have continuous glucose monitoring. (4) The cost effective solution is required to be developed for all types of patients (diabetic, pre-diabetic and healthy).

The main Contribution of the paper is as follows: 

  1. 1.

    The non-invasive measurement of blood glucose is carried out on two diabetes data-sets which are PIDD (Pima Indian Diabetes Data-set) and data-set collected from a intelligent glucometer device iGLU.

  2. 2.

    The machine learning based regression approaches are implemented to get the accurate predicted blood glucose measurement value.

  3. 3.

    Clarke Error Grid analysis has also been conducted on iGLU data-set to validate the feasibility of proposed solution with the outcome of 100% values lie in the A-B zone which are clinically accepted.

  4. 4.

    The proposed machine learning model is also compared with other related work on the basis of various performance evaluation criteria such as MAD, mRAD, RMSE, AvgE to find the most suitable and accurate algorithm for diabetes detection and prediction of glucose level.

3 Proposed methodology

The paper presents a machine learning model for detection of diabetes and prediction of glucose level. The optimised regression models are used for calibration to have accurate glucose prediction from the non-invasive iGLU device [31]. The proposed model is designed with the purpose to create a non-invasive, wearable, painless, precise, and low-cost device with high accuracy [32, 33]. Figure 4 presents the process flow of proposed model.

Fig. 4
figure 4

Process Flow of Proposed Model

Various machine learning algorithms have been used on the data acquired from UCI repository (PIDD) [34] and the iGLU device (iGLU dataset) [35]. We have applied various regression and classification algorithms on these data sets individually. Further, the results of these models based on various evaluation criteria such as MAE, RMSE and Accuracy Score are compared. For classification, Recall, F1-Measure and ROC curves have also been used for comparison. Figure 5 represents process of conceptual framework for proposed methodology for glucose measurement.

Fig. 5
figure 5

Process of Conceptual Framework

The following steps have been followed to get the desired results.

  1. 1.

    Data Acquisition: This step involves collection of data. We have used two datasets out of which one is an open source data set and the other has been collected by iGLU device.

  2. 2.

    Data Prepossessing: This involves processing the data in such a way that it becomes ready for machine learning algorithm implementation [36]. This include arranging data into proper attributes, making sure that the data is complete and removing the unwanted data.

  3. 3.

    Applying Machine learning algorithms: This step involves the application of various machine learning algorithms on the datasets and training the model.

  4. 4.

    Performance Evaluation: Here the results of the applied algorithms are evaluated and the performance of each algorithm in terms of accuracy is checked.

  5. 5.

    Comparative Analysis: Here the comparison of all the applied machine learning algorithms is done to understand which one of them is the best for diabetes detection and prediction of glucose level.

3.1 Data acquisition

In this work, two datasets have been taken to work on which are described below:

Pima Indian Diabetes Dataset (PIDD)

The proposed machine learning model is evaluated on an open source data namely pima indian diabetes dataset which is acquired from the UCI repository. The dataset has medical details of 768 female patients. Table 1 represents the dataset description and the Table 2 represents the attributes description for PIDD.

Table 1 Dataset description
Table 2 Attributes and Abbreviaton Used for PIDD

iGLU dataset

The workflow of proposed model with iGLU device is presented in Fig. 6. The data is accumulated by three fingers by placing them between the emitter and detector lined with the pads. The design of the pads is such that the emitters and detectors are placed underneath the surface of the pads. Detectors with daylight blocking filters are packaged in a such a way that they are not affected by sweat. Hence, the probability of a flawed measurement is reduced to minimum. Table 3 represents the attributes description for iGLU dataset.

Table 3 Attributes and Abbreviation Used for iGLU dataset
Fig. 6
figure 6

Workflow of proposed model with iGLU device

3.2 Data preprocessing

This involves two steps. First, we filled the missing values (data cleaning) [37]. Then we selected features and labels and then we modified the data by using scaling. For PIDD, we used auto scaling whereas for iGLU dataset we made our own scaling function to scale the data so that it can fit in the model in best way.

3.3 Applying machine learning algorithms

The machine learning algorithms have been implemented for the blood glucose measurement. The logistic regression is applied to classify the whether the patient has diabetes or not. We have used Multiple Polynomial Regression (degree = 2 to degree = 7) for 3 independent variables and its corresponding output variable (dependent variable). We have chosen the most suitable degree which gives best results. Further, SVM is used as classifier (support vector classification algorithm) and regressor (support vector regression algorithm) both to detect the disease and prediction of glucose level in our work. Subsequently, KNN (K Nearest Neighbors), Decision Tree, Random Forest, Gradient Boost, Gaussian naive Bayes and Deep Neural Network have been used for the glucose measurement. The block diagram of various machine learning models for both data-set is shown in Fig. 7.

Fig. 7
figure 7

Applied machine learning algorithms in accordance to dataset

4 Performance evaluation and results discussion

The performance has been evluated for both datasets for diabetes by applying regression and classification type of machine learning algorithm. Logistic Regression, KNN, Gaussian Naive Bayes are applied for classification purpose and Linear Regression, Multi-polynomial Regression are applied for regression purpose and Neural Network, XGBoost, Decision Tree, Random Forest and Support Vector Machine are applied as both classifiers and regressors.

4.1 Performance evaluation

The effectiveness of the model is measured using various quantified parameters. The performance of classification is measured with Accuracy, Confusion Matrix, Precision, Recall and ROC-AUC curve, whereas the performance of Regression is evaluated using Accuracy, MAE, RMSE. All these parameters used for performance evaluation are explained further in brief.

  1. (a)

    Confusion Matrix:

    Confusion Matrix is summary of predictions made by the classifier [38]. It keeps the count of correct and wrong predicted values. For two class problem, there are four parameters:

    • True Positive (TP) : Actual positive value is predicted positive.

    • False Negative (FN) : Actual positive value is predicted negative.

    • False Positive (FP) : Actual negative value is predicted positive.

    • True Negative (TN) : Actual negative value is predicted positive.

  2. (b)

    Precision:

    Precision is measurement of how many positive predictions are made for actual positive values. The formula is,

    $$P = \frac{TP}{(TP+FP)}$$
  3. (c)

    Recall/true positive rate/sensitivity:

    Recall is measurement of correct positive predictions from all positive predictions made and is defined as,

    $$R = \frac{TP}{TP+FN}$$
  4. (d)

    Accuracy:

    The total number of correct predictions made out of all. It shows overall effectiveness of classifier. The formula is,

    $$AC = \frac{(TP+TN)}{TP+FN+FP+TN}$$
  5. (e)

    F-1 Score:

    F-1 Measure is a combination of precision and recall and is defined as follows:

    $$F-1 Score= \frac{(2*Precision*Recall)}{(Precision+Recall)}$$
  6. (f)

    ROC-AUC curve:

    AUC - ROC is a curve that measures the performance of classification problem at various thresholds settings. ROC is a likelihood curve and AUC represents a degree/measure of separability. A receiver operating characteristics (ROC) curve tells us how good a classifier can differentiate between two classes (i.g. whether the patient has diabetes or not). Better classifier can differentiate accurately. As greater the value of AUC under the curve defines the better the performance of classifier. ROC curve is plotted as TPR (true positive rate or Sensitivity) on y-axis against with FPR (false positive rate or 1-Specificity) on x-axis.

  7. (g)

    Mean absolute error (MAE):

    Mean absolute error (MAE) is average over the test data of the absolute differences between actual values and predicted values. It measures the mean amount of the errors in the data of predictions, without taking directions into consideration.

  8. (h)

    Root mean square error (RMSE):

    As name says RMSE is the square root of mean of squared error. Error is defined as differences between actual value and predicted value. It is a quadratic scoring rule, that is, this score represents/reflects the closeness of an anticipated probability distribution to the detected/realized output. It measures the mean amount of the error.

    $$RMSE = \sqrt{ \frac{\sum ^{n}_{j=1}(y_{j}-y^{'}_{j})^2}{n} }$$

4.2 Comparative analysis

The results of application of classification and regression on the two datasets have been compared by forming tables. Their evaluation based on the above mentioned parameters has been shown in the form of charts, tables and confusion matrix. This gives us a clear indication for selecting the best algorithm to be used for detection of diabetes and prediction of glucose values.

4.3 Analysis of Pima Indian diabetes dataset

According to experimental studies, the datasets have been distributed between 80-20 % (614-154) for training and testing purpose. We have applied various classification algorithms like Logistic Regression, Gaussian Naive Bayes, Support Vector Machine (Linear), Gradient Boost, Neural Network, KNN, Random Forest, Decision Tree and obtained the parameters like Accuracy, ROC-AUC, Confusion Metrics. From confusion metrics, we have calculated Precision, Recall and F1-Score. ROC curve is plotted for each algorithm and the classifier which have covered more area is better. Table 4 shows training accuracy, testing accuracy and training time for each algorithm. Among all the applied algorithms, Random Forest Algorithm has the highest accuracy 84% by taking 0.16 secs for training. Being the simplest classifier, Logistic regression has performed effectively with 82 % accuracy by taking less time than Random Forest which is 0.05 secs. However, these results can be enhanced by applying larger updated data.

Table 4 Comparison of Algorithm in terms of Accuracy and Time
Fig. 8
figure 8

Comparison of Algorithms in terms of Accuracy and Time

Figure 8 shows the comparison of applied algorithms in terms of training-testing accuracy and training time. These bars show that SVM Linear and Neural Network take more than 2 secs to train the model. All algorithms providing more than 70 % accuracy. Also 4 out of 8 algorithms gives more than 80 % accuracy which is quite well. Accuracy alone is not enough for choosing the better classifier. So, we have calculated confusion metrics and ROC curve for different algorithms.

  1. (a)

    Logistic Regression Algorithm

    Confusion matrix obtained from LR is represented in Figs. 9 and 10. We train the preprocessed data through Logistic Regression Algorithm and get the following results. AUC-ROC curve obtained for Logistic Regression algorithm with resultant AUC = 0.87 (Fig. 11).

  2. (b)

    SVM Linear Algorithm

    Confusion matrix obtained from SVM is represented in Figs. 12 and 13 We train the preprocessed data through Support Vector Machine (Linear) Algorithm and get the following results. AUC-ROC curve obtained for SVM Linear algorithm with resultant AUC = 0.86 (Fig. 14).

  3. (c)

    XGBoost Algorithm

    Confusion matrix obtained from XGB is represented in Figs. 15 and 16. We train the preprocessed data through XGBoost Algorithm and get the following results. AUC-ROC curve obtained for XGBoost algorithm with resultant AUC = 0.85 (Fig. 17).

  4. (d)

    Gaussian Naive Bayes Algorithm

    Confusion matrix obtained from GNB is represented in Figs. 18 and 19. We train the preprocessed data through Gaussian Naive Bayes Algorithm and get the following results. AUC-ROC curve obtained for Gaussian Naive Bayes algorithm with resultant AUC = 0.84 (Fig. 20).

  5. (e)

    KNN Algorithm

    Confusion matrix obtained from KNN is represented in Figs. 21 and 22. We train the preprocessed data through KNN Algorithm and get the following results. AUC-ROC curve obtained for KNN algorithm with resultant AUC = 0.83 (Fig. 23).

  6. (f)

    Decision Tree Algorithm

    Confusion matrix obtained from DT is represented in Figs. 24 and 25. We train the preprocessed data through Decision Tree Algorithm and get the following results. AUC-ROC curve obtained for Decision Tree algorithm with resultant AUC = 0.81 (Fig. 26).

  7. (g)

    Random Forest Algorithm

    Confusion matrix obtained from RF is represented in Figs. 27 and 28. We train the preprocessed data through Random Forest Algorithm and get the following results. AUC-ROC curve obtained for Random Forest algorithm with resultant AUC = 0.87 (Fig. 29).

  8. (h)

    Neural Network Algorithm

    Confusion matrix obtained from NN is represented in Figs. 30 and 31. We train the preprocessed data through Neural Network Algorithm and get the following results. AUC-ROC curve obtained for Neural Network algorithm with resultant AUC = 0.66 (Fig. 32). From confusion matrix, precision, recall and f-1 score is also calculated for training and testing both (Table 5). Figure 33 represents comparison of algorithms in terms of AUC. Random Forest and Logistic Regression both algorithms have same AUC =0.87. These two are better classifiers in comparison to others. These results are better in comparison to a previous work done on the same dataset [38].

Fig. 9
figure 9

Training cm

Fig. 10
figure 10

Testing cm

Fig. 11
figure 11

ROC for Logistic Regression

Fig. 12
figure 12

Training cm

Fig. 13
figure 13

Testing cm

Fig. 14
figure 14

ROC for SVM Linear

Fig. 15
figure 15

Training cm

Fig. 16
figure 16

Testing cm

Fig. 17
figure 17

ROC for XGBoost

Fig. 18
figure 18

Training cm

Fig. 19
figure 19

Testing cm

Fig. 20
figure 20

ROC for GNB

Fig. 21
figure 21

Training cm

Fig. 22
figure 22

Testing cm

Fig. 23
figure 23

ROC for KNN

Fig. 24
figure 24

Training cm

Fig. 25
figure 25

Testing cm

Fig. 26
figure 26

ROC for Decision Tree

Fig. 27
figure 27

Training cm

Fig. 28
figure 28

Testing cm

Fig. 29
figure 29

ROC for Random Forest

Fig. 30
figure 30

Training cm

Fig. 31
figure 31

Testing cm

Fig. 32
figure 32

ROC for Neural Network

Table 5 Summary of prediction for different algorithms
Fig. 33
figure 33

Comparison of Algorithms in terms of AUC

4.4 Analysis of iGLU dataset

The dataset has been split into 80:20 for training and testing purpose for best results. In other distributions, it becomes highly prone to over-fitting. Among the applied all the algorithms, best results are obtained by Decision Tree with accuracy of 70.64 % having mean absolute error of 7.89 % and root mean square error of 8.56 % which is lowest among all algorithms. Gradient boost regression with certain parameters gives the lowest relative absolute error 6.82 % which is quite well. The Spyder tool has been used to take these measurements. Table 6 shows the summary of prediction for different algorithm. However, these results can be improved by applying a updated, large sized dataset.

Table 6 Summary of Prediction for Different Algorithms

Figure 34 shows comparison of regression algorithms in terms of MAE, RMSE and Accuracy. Logistic Regression is best with low value of least root mean square error. Linear Regression and Neural Network are not providing good result in our work. Table 7 shows the comparison of applied algorithms based on training time and training-testing score. Though Neural network regressor takes longest time to run but Decision Tree gives best testing score of 70.64 % by taking much lesser time.

Fig. 34
figure 34

Comparison of Algorithms in terms of Accuracy, MAE, RMSE

Table 7 Comparison of Algorithm in terms of Accuracy and Time

Figure 35 represents comparison of algorithms in terms of training-testing score and training time. As we can see, all algorithms take less than 1 sec to train the model. XGBoost, Random Forest and Decision Tree gives good accuracy while other algorithms (Linear Regression, Polynomial Regression, Support Vector Regression, Neural Network) does not provide more than 30 % accuracy for training and testing both as our dataset is too small and chances of over-fitting increase in case of these algorithms.

Fig. 35
figure 35

Comparison of Algorithms in terms of Score and Training Time

4.5 Clarke error grid analysis

Clarke Error Grid is used to quantify the clinical accuracy of the values predicted by the device or method under the test in comparison of reference glucose values (values obtained from clinically approved method) [39]. Predicted values are plotted on y- axis whereas reference values are plotted on x-axis. There are five zones A, B, C, D and E in the grid. Zone ’A’ (accepted) represents the predicted value which deviate 20 % from the reference value or in hypoglycemic range (<70 mg/dl). Zone ’B’ (benign errors) represents the values which are deviated from reference values more than 20 % but are clinically uncritical. Zone ’B’ lies below and above the zone ’A’ in the grid. Zone ’A’ and ’B’ are clinically accepted. Zone ’C’, ’D’ and ’E’ values will lead to wrong decisions and are potentially dangerous.

Figure 36 represents Clarke Error Grid Analysis of iGLU datset used in our work. 86 % values are in zone ’A’ and 14 % values lies in zone ’B’ that are clinically accepted. In zone ’C’,’D’ and ’E’ there are no values.

Fig. 36
figure 36

Clarke Error Grid of iGLU dataset

4.6 Comparison with previous work

We have also computed Root Mean Square Error (RMSE) and Mean Absolute Error or Mean Absolute Deviation (MAD) which we have mentioned before. Now, we have also computed Average error (AvgE) and mean absolute relative difference (mARD). The average error 9.03 % and mean absolute deviation 6.82 % represents the accuracy of our model. Table 8 represents comparison of our work with previous work.

Table 8 Comparison with previous work

5 Conclusion and future scope

The paper presents a machine learning models for blood glucose measurement using non-invasive technique on two different data sets. The comparative analysis of machine learning algorithms mainly as Logistic Regression, Decision tree, K Nearest Neighbors (KNN), Random Forest, SVC (linear), Gradient Boost, Gaussian Naïve bayes and Neural Network have been used to detect the diabetic samples from the PIMA Indian diabetes data (PIDD) and XGB Regression, Linear Regression, Multi-Polynomial Regression, SVR, Random Forest, Decision Tree and Neural Network to predict the glucose level using the data set collected by the iGLU device. The performance of these algorithms is compared on the basis of MAE, RMSE, Accuracy, ROC, Precision, F-1 Measure and Recall parameters obtained from the dataset. Random Forest and Logistic Regression has AUC value of 0.87 which suggest our model is good at diabetes detection. Also Decision Tree with 70 % accuracy and RMSE 8.56 % can be used for prediction of glucose level for most accurate results. However, these results can be improved by increasing the size of dataset. Clarke error grid analysis is also done where all values lies in zone ’A’ and ’B’ so the model has able to predict diabetes correctly. The further work is proposed to address the security and privacy issues for continuous glucose measurement. The efforts would also put forward to integrate robust mechanism of insulin drug delivery for type -1 diabetes patients.