Introduction

Diabetes mellitus (DM) caused by impairment in the glucose metabolism is a worldwide epidemic; it is estimated that by the year of 2035, there will be approximately 592 million people affected by this disease [1, 2]. DM is caused by a defect in insulin secretion and results in a change in the blood glucose levels. Therefore, DM is diagnosed or monitored by measuring levels of glucose in the blood. It is important to frequently monitor blood glucose levels and maintain it within the prescribed range to avoid secondary complications including strokes, heart attacks, blindness, and coma [3, 4]. In addition, DM patients with an established diagnosis of insulin-dependent diabetes (all type I, and many type II) require frequent glucose measurements for monitoring and adjustment of insulin doses. Currently, DM patients use the hand-held glucose-monitoring device to monitor the BG levels. The device requires small blood sample (< 1 μL) obtained by a “finger-pricking” followed by electrochemical sensing using a portable “glucometer.” Unfortunately, the procedure is inconvenient and painful, thus resulting in poor patient compliance [5, 6]. Other clinical tests require either blood or interstitial fluids for measurement of glucose levels. In this regard, it is important to develop a device that can enable non-invasive monitoring of blood glucose levels.

Significant research has been undertaken in the quest of identifying a novel diagnostic tool for managing DM [7,8,9]. This research can be broadly categorized into electrochemical and optical sensing approaches. Electrochemical sensing of blood glucose levels utilizes enzymatic or non-enzymatic methods. Optical methods can be further subdivided into fluorophores (fluorescein) and non-fluorophore-based approaches [10]. Recent advancement in the instrumentation and analysis methods has provided a great momentum to non-fluorophore-based approaches such as near-infrared (NIR) absorption spectroscopy, optical coherence tomography (OCT), photoacoustic spectroscopy, and polarization spectroscopy for glucose sensing [11, 12, 7, 13]. Our laboratory has pioneered the applications of NIR Raman spectroscopy (RS) for measuring blood glucose, urea, lactic acid, and cholesterol levels [14,15,16,17,18,19]. RS is based on inelastic scattering of light, where changes in the molecular polarizability lead to a shift in the wavelengths associated with specific chemical bonds and this, in turn, provides a molecular fingerprint with both qualitative and quantitative information. Other benefits of RS include non-destructive and label-free nature, no sample preparation requirement, minimum water interference, and real-time evaluation. Overall methodology of Raman spectroscopic glucose sensing involves transcutaneous spectral measurement followed by feature extraction using multivariate chemometric modeling approaches such as partial least square regression (PLSR), principal component regression (PCR), or support vector regression (SVR) followed by cross-validation. Typically, a calibration model is developed by tying up the blood glucose values with corresponding Raman spectra. Leave-one-out or K-fold cross-validation methods are employed to establish the accuracy of spectral models against the reference values. Our long-term efforts in realizing the goal of non-invasive Raman glucose sensing have been evaluated by clamping studies in animals and oral glucose tolerance test (OGTT) in humans [17, 20, 21]. Recently, Shao et al. have shown that by focusing laser directly in to the blood stream of mouse models, problem of strong background can be avoided to some extent [22]. Shih et al. have demonstrated feasibility of non-invasive transcutaneous measurement of blood glucose using dogs [20]. Scholtes-Timmerman et al. reported a human trial involving 186 subjects using a system with fairly large spot size (8 mm). PLS modeling coupled with cross-validation yielded encouraging results [23]. Weber et al. developed a tabletop confocal Raman system to collect signals from interstitial fluid and tested this set-up on a group of 35 patients [24]. Even though results of these studies have provided substantial evidence in support of RS for monitoring glucose level, one of the major hindrances in successful translation of spectroscopic glucose sensors for routine clinical usage is the unavailability of appropriate exit criterion for analyzing the unknown samples [21]. Efforts have been made towards developing a universal calibration model. However, it is limited by non-linearity of glucose concentration, site, and inter-personal-variance in terms of skin color and thickness, basal metabolic rates, and hydration status [25]. To overcome these issues, “patient-specific” calibration strategies with minimum blood reference values are preferred. Our recent efforts have been inclined towards minimizing the requirement of blood reference values for calibration [26]. Here, we are presenting a systematic investigation of the predictive accuracy dependence of spectroscopic models on the ratio of calibration and validation points. Three different calibration strategies using 50, 30, and 18% of the total data as a reference have been employed and prediction accuracies of spectroscopic models are evaluated. In contrast to earlier studies where the accuracy of spectroscopic models was mostly accessed by “leave-one-out cross-validation,” here we have employed an “independent test prediction” approach to obtain a realistic prospective read-out [17, 21, 20]. Also, an accurate reference method based on serum glucose has been used for better correlation. Findings of the study will be helpful in not only reducing the requirement of unnecessary finger pricks and associated discomfort but also providing an appropriate validation to RS methods with respect to the clinical OGTT.

Materials and methods

Oral glucose tolerance testing

This pilot study was conducted at the Clinical Research Center of MU-Institute of Clinical and Translational Sciences, University of Missouri–Columbia. The study protocol was approved by Health Sciences Institutional Review Board (Protocol number: 2002948) at University of Missouri–Columbia. Written informed consent prior to participation was obtained from each of the participants of the study. A total of 23 healthy non-diabetic and non-pregnant volunteers aged 18 years or older (median age 33 years) were chosen for the study. Of the 20 qualified volunteers, 45% were male and remaining 55% were female. Fasting blood glucose (FBG) values less than or equal to 125 mg/dL were set as an inclusion criterion for the subjects in the study. The blood glucose levels of the subjects were checked prior to the study and those qualified were given a standard glucose drink, used in clinics for OGTT (75 g of glucose-rich beverage Dextrose, Azer Scientific Inc.). The spectral acquisition was initiated immediately after the drink and thereafter every 10 min for 160 min. Serum glucose measurements were performed concurrently every 10 min using YSI glucose analyzer. Finger prick measurements were also performed every 30 min using Accucheck™ blood glucose meters to corroborate YSI measurements.

Raman set-up and spectral acquisition

In contrast to our previous free-space glucose Raman instruments, a fiber-optic probe coupled glucose Raman unit was used in the present study, Fig. 1A [17, 20]. A wrist support with a small hole to hold the probe at the same tissue spot over the course of the experiment was fabricated (Fig. 1B). This wrist support minimizes unwanted light interference and allows Raman measurements under the ambient room light. A diode laser (Process Instruments) of 830 nm is used as an excitation source. Excitation light is launched into the central fiber of the probe. Fiber background signals are removed by incorporating a short pass filter at the other end of the fiber. The filtered excitation beam is delivered into the tissue through a sapphire ball lens and back scattered signal is collected by the same lens. The Rayleigh scattered light is eliminated by a long pass filter in front of the six collection fibers surrounding one central excitation fiber, and the filtered Raman signal is delivered to the imaging spectrograph. Collection fibers are aligned as a line at the spectrograph entrance and the signal is dispersed by an imaging spectrograph (LS785, Princeton Instruments). Dispersed spectrum is detected by a back-illuminated deep-depletion CCD (Spec-10:400BR-XTE, Princeton Instruments). The Raman instrument for glucose monitoring is built inside of a portable cart (84 cm × 48 cm × 100 cm) for easy transfer between laboratory and clinical research center. Spectral preprocessing steps included standard normal variate (SNV) correction to remove scaling differences from the spectra followed by subtraction of third-order polynomial function for objective baseline correction.

Fig. 1
figure 1

(A) Schematic of portable Raman system employed in the study. (B) Fiber-optic probe-holding adapter

Data analysis

Partial least square regression (PLSR) and Clarke error grid analysis using MATLAB-based in-house codes was employed. Blood glucose values obtained by serum measurements and corresponding Raman spectra at 17 time points during the course of OGTT of 20 patients were recorded. Data from two patients were not included due to poor spectral quality. Three different calibration models using 50, 30, and 18% corresponding roughly to 9/17, 5/17, and 3/17 spectra of individual OGTT were developed. Rest of 50, 70, and 82% data were used as independent test data. Appropriate ranks of PLS calibration ranging from 3 to 5 were used in accordance with the general assumption of having samples numbers at least three times to the rank of PLS calibration [27].

Results and discussion

In the present study, a calibration model is developed by tying up serum glucose values with corresponding Raman spectra. Typical transcutaneous Raman spectra acquired from one of the subjects during the course of OGTT is shown in Fig. 2A. Corroborating with the spectral profiles noted in our previous studies, major spectral features are suggestive of collagen, lipids, and structural proteins [26, 28]. Strongest peak at 1445 cm−1 is assigned to CH2 stretching; other features at 859 cm−1, 938 cm−1 (collagen), 1004 cm−1 (phenylalanine), 1273 cm−1, 1302 cm−1 (amide III), and 1655 cm−1 (amide I) were also observed. Even though near-infrared (NIR) excitation is used, as the spectra were obtained transcutaneous, large fluorescence envelope is observed, Fig. 2A. Expectedly, small Raman bands related to glucose are overwhelmed by the background and thus necessitating requirement of multivariate analysis to identify these variations with respect to time and serum glucose concentration. Mean OGTT curve along with standard deviation with every 10-min read-out is shown in Fig. 2B. OGTT is a standard diabetes screening procedure, where a subject is given a glucose-rich drink to induce a substantive rise in the blood glucose level. Blood samples are then withdrawn using an intravenous (IV) line at specific time intervals (10 min) to investigate the rate of clearance of glucose from the blood and therefore to infer the effectiveness of the subject’s insulin-based glucose regulation mechanism. Standard bell-shaped OGTT curve of subjects shown in Fig. 2B suggests healthy nature of the participants in the study.

Fig. 2
figure 2

(A) Typical transcutaneous Raman spectrum acquired over the course of OGTT. Spectrum at time points 0, 100, and 160 min is marked as T0, T100, and T160, respectively. (B) Mean OGTT curve along with standard deviation obtained from all the participants of the study

Choosing an appropriate “local calibration model” is a prerequisite for employing regression analysis in spectroscopic quantitative measurements. Raman spectra were acquired at 17 time points (0 to 160 min) on 20 patients during OGTT. Random sampling approach was employed and in the first step, 50% of the total data was used for developing a calibration model. Rest of data were used as “independent test data.” Clarke error grid analysis (EGA), approved by food and drug administration (FDA), is one of the widely utilized methods for evaluating clinical accuracy of new method/devices in estimating blood glucose with respect to a reference method [29, 30]. The Clarke error grid plots with independent test data predictions are shown in Fig. 3. As per the FDA, guideline zones A and B in the plots correspond to the area with ± 20% error rate and are clinically acceptable. Rest of the areas C, D, and E are clinically irrelevant and considered to be potentially dangerous. As shown in Fig. 3A, most of the independent test data fall into the clinically acceptable zone A (71%) and zone B (26%) region of the error grid. Further, we reduced the number of calibration point to 30 and 18% of the total data using random sampling approach. As can be seen from Fig. 3B, C, a total of 97 and 96% of the test data for these two calibrations strategies, respectively, are still in the clinically relevant zones of the Clarke error grid. Overall findings of EGA are suggestive of high accuracy of spectroscopic models for predicting analogous glucose values. As shown in Table 1, clinically valid predictions obtained with three calibration strategies further support applicability of transcutaneous spectroscopic measurements for glucose monitoring.

Fig. 3
figure 3

Clarke error grid plots showing independent test prediction accuracy of spectroscopic models with different numbers of calibration points. (A) 50%. (B) 30%. (C) 18%

Table 1 Summary of predictive accuracy of spectroscopic models with different calibration strategies

The measurement of a variable with two different methods will always have some degree of error as neither can provide unequivocally same value. An accurate estimation of the degree of agreement can provide valuable information about the efficiency of both methods. The correlation coefficient (r) and coefficient of determination (r2) are two widely utilized parameters to access the linear relationship between two measurements [31]. As shown in Table 1, an average correlation value of ~ 63% was observed. However, this value is not reliable as it measures only the strength of relation or percentage common variance between two variables, not the agreement between them. Limit of agreement (LOA) is quantified by defining mean and standard deviation(s) of the difference between two methods. The difference between the measurements (A − B) is plotted against the mean of difference (A + B)/2. Two different methods are considered to be in good agreement if ~ 95% data points are in the ± 2 standard deviations (1.96) of the mean difference. Bland-Altman (B&A) plots are generally employed to perform “the analysis of difference” between two quantitative measurements [32]. The analysis of difference plots using YSI analyzer values as a reference method against Raman spectroscopic predictions is shown in Fig. 4. Parameter “bias” is an indicator of the average difference in the predicted and reference glucose values. As shown in Table 1, spectral models with 50, 30, and 18% calibration points have bias of − 1.02, 1.23, and − 4.47 mg/dL, respectively. Most of the points (> 95%) are in the “range of agreement” suggesting analogous blood glucose values predicted by Raman measurements. Overall, the results obtained are an indicator of the high accuracy of spectroscopic models in predicting independent test samples even with minimum number of calibration points.

Fig. 4
figure 4

Bland-Altman plot demonstrating degree of agreement between reference and Raman predicted values with different numbers of calibration points. (A) 50%. (B) 30%. (C) 18%. (LOA, limit of agreement)

As glucose constitutes less than 0.1% of human tissue by weight, spectral differences induced by concentration changes are mitigated by skin variations during transcutaneous measurements [3, 25]. Therefore, use of multivariate algorithms to identify the subtle changes in a temporal manner becomes a necessity. Regression methods coupled with different cross-validation strategies have been employed largely to link the spectral changes with glucose concentration. However, even though cross-validation is widely used, its validity for the larger data set and prospective predictions is debatable [33]. As the probability of having an equivalent pair of spectrum and reference blood glucose values increases in larger data set, the root mean square error calculation can turn overly optimistic. Therefore, partitioning of samples into independent calibration and validation sets is a preferred approach. Although it has not been studied in great detail with respect to glucose sensing, available reports suggest it to be a heuristic task [33]. Random sampling approach for selecting a representative sub set of calibration and validation samples from the pool of data is commonly employed because of its simplicity and the fact that group of data randomly extracted from a larger set follows the statistical distribution of the entire set [34]. The main objective of spectroscopic glucose sensing methods is to replace/reduce the finger pricks or blood withdrawals required for glucose monitoring. However, all the spectroscopic devices require a calibration framework to predict unknown concentrations. Minimizing the number of calibration steps needed for continuous monitoring using non-invasive devices can help in comprehending the ultimate goal. In the present study, we have evaluated prediction accuracy of Raman spectroscopic sensors with different numbers of calibration points during OGTT. Spectra were randomly divided into three different calibration sets with 50, 30, and 18% of the total data set. In all three cases, ~ 97% of the independent test data falls into the clinically acceptable area of the Clarke error grid. The rise in number of predictions in zone B of the error grid with reducing calibration points is expected due to non-linear nature of spectroscopic signatures and decreasing variance cover. Further, the fact that random sampling approach employed in the present study does not always guarantee the inclusion of borderline samples in calibration models could be another factor. The bias between the predicted and reference value is a good indicator of the predictive power of calibration models. As per the ISO15197 guidelines, a difference of 15 mg/dL (for values < 100 mg/dL) or 15% (for values ≥ 100 mg/dL) is considered to be acceptable accuracy of new devices [35]. As shown in Bland-Altman plots and Table 1, the bias between predicted and reference blood glucose values was minimal for spectral model with highest number of calibration points (50%) followed by 30 and 18%. These values were in the acceptable range, suggesting small number of calibration points can also lead to a good predictive accuracy, provided accurate blood referencing methods are used.

Overall findings of the present study further provided evidence in support of the prospective application of Raman spectroscopic methods for non-invasive glucose monitoring. In contrast to earlier studies, by utilizing an accurate reference method, we have successfully demonstrated the strong predictive accuracy of spectroscopic models with minimum calibration information. Further efforts are underway to examine the predictive accuracy in DM patients. Our future studies will also focus on improving the accuracy by replacing random sampling methods with new algorithms for more effective partitioning of calibration and validation sets to improve the predictive ability and robustness of the resulting model.