Introduction

With the rapid growth of the world’s population and the proportion of the elderly population, type 2 diabetes mellitus (T2DM), as a well-known metabolic disease, has become a major global public health problem [1,2,3]. Diabetic nephropathy (DN), also known as diabetic glomerulosclerosis or diabetic kidney disease, is one of the most devastating complications of T2DM [4, 5]. DN is characterized by pathophysiological changes of the kidney, including albuminuria, a progressive decrease in glomerular filtration rate, glomerulosclerosis and tubulointerstitial fibrosis due to increased expression of extracellular matrix, glomerular and tubular membrane thickening, and vascular dysfunction [6, 7]. DN is the most common cause of the end-stage renal disease (ESRD) worldwide and is associated with increased morbidity and mortality in patients with T2DM [8, 9]. Thereby, identification of DN is of great clinical significance.

Renal pathology is the gold standard for the diagnosis of DN [10]. However, renal biopsy is an invasive, expensive procedure, and might accompany with complications [11]. In order to reduce the operative risk in the target population requiring renal biopsy, it is imperative to identify new clinical biomarkers with diagnostic power for the differential diagnosis of DN. Albuminuria is one of the biomarkers to screen for renal function, which is typically used to reflect primarily glomerular injury and increased glomerular permeability to macromolecules. Nevertheless, albuminuria has many limitations such as lower sensitivity, it does not predict renal outcomes, and it is not specific to DN. In recent years, new biomarkers for renal injury have been proposed [12, 13]. A study in China implicated that blood cell was significantly associated with an increased risk of newly diagnosed biopsy-proven DN among people with T2DM [14]. The increased prevalence of DN has been found to be associated with lower hemoglobin (HB) concentrations [15]. The Saudi diabetic kidney disease study reported that subjects with microalbuminuria demonstrated significantly increased mean values of white blood cells (WBC) and mean corpuscular hemoglobin concentration (MCHC) compared with those without DN [16]. A meta-analysis implied that platelet distribution width (PDW) could be recommended as a diagnostic biomarker for DN [17]. Early biomarkers may help early diagnosis and treatment of DN, reduce the prevalence of DN and delay the progression of DN. However, few studies establish models for diagnosing DN in patients with T2DM based on biomarkers. The model may be helpful for the diagnosis of the probability of DN at an early date, which can help high-risk populations to take timely intervention measures to reduce the morbidity and mortality of diabetes-related complications.

The purpose of the present study was to investigate the hypothesis that incorporating biomarkers data to establish a diagnostic model for the identification of DN in T2DM, provides a reference for early diagnosis of DN.

Methods

Study design and populations

This study was designed as a cross-sectional study, which was conducted on 359 participants from August 2018 to March 2021 at the Second Hospital of Shijiazhuang. Inclusion criteria were: (1) patients with T2DM diagnosed according to the 2010 American Diabetes Association criteria [18]; (2) patients with age ≥ 18 years old; (3) patients with complete baseline data, medical history, clinical characteristics, and laboratory examination data. Exclusion criteria were as follows: (1) patients with kidney stones or urinary tract infection; (2) patients with malignant tumors; (3) patients with systemic diseases such as systemic lupus erythematosus and vasculitis; (4) patients with chronic kidney disease (CKD) diagnosed before T2DM; (5) patients with lupus nephritis or allergic purpura nephritis; (6) patients with severe liver or thyroid disease; (7) pregnant or breastfeeding patients; (8) patients with severe blood disease; (9) those who have taken other experimental drugs or are participating in other clinical studies within 1 month prior to enrolment in the study The study was approved by the ethics committee of the Second Hospital of Shijiazhuang (No. Sey2021005). Written informed consent was waived due to the retrospective nature of the study.

Data collection

The following clinical variables were reviewed through electronic case reports form: (1) baseline information: gender, age (years), smoking history, drinking history, systolic blood pressure (SBP), diastolic blood pressure (DBP); (2) history of diabetes: duration of diabetes (months), fasting blood glucose (mmol/L), diabetic retinopathy (DR), diabetic peripheral vascular disease, and drug treatment; (3) laboratory data: red blood cell (RBC, 1012/L), WBC (109/L), platelet count (PLT, 109/L), HB (g/L), neutrophilic granulocyte percentage (NEUT, %), lymphocyte percentage (LY, %), monocytes percentage (MONO, %), hematocrit (HCT, %), plateletcrit (PCT, %), MCHC (g/L), mean corpuscular volume (MCV, fL), mean platelet volume (MPV, fL), PDW (%), red blood cell distribution width (RDW, %), albumin (ALB, g/L), blood urea nitrogen (BUN, mmol/L), uric acid (UA, μmol/L), serum creatinine (SCR, μmol/L), prothrombin time (PT), activated partial thromboplastin time (APTT), fibrinogen (FIB, g/L), international normalized ratio (INR), D-dimer (μg/mL), serum total cholesterol (CHO, mmol/L), triglyceride (TG, mmol/L), high density lipoprotein cholesterol (HDL-C, mmol/L), low density lipoprotein cholesterol (LDL-C, mmol/L).

Outcome

The outcome of the study was DN. DN was diagnosed on the basis of an estimated glomerular filtration rate (eGFR) < 60 mL/min/1.73 m2; the urinary microalbumin creatinine ratio was reviewed within 3 to 6 months, and the excretion of urinary protein increased in two of the three times (urine albumin/creatinine ratio [UACR] > = 30 mg/g.

Statistical analysis

Measurement data with normal distribution were described as mean ± standard deviation (mean ± SD), and an unpaired t-test was used for comparison between groups. Non-normal measurement data were described by the median and interquartile range [M(Q1, Q3)], and the Wilcoxon rank sum test was used for comparison between groups. Enumeration data were described by the number of cases and constituent ratio (N(%)), and comparison between groups was analyzed by the chi-square test. Missing values were imputed using the random forest imputation method.

LASSO regression was used to screen out statistically significant factors associated with DN in T2DM, and the selected factors were input into the decision tree for fivefold cross validation. For variable screening using LASSO regression, the α with the smallest mean-squared error (MSE) was selected. Then an eXtreme Gradient Boosting (XGBoost) model was established based on the selected factors. The performances of the model to diagnose DN in T2DM were quantified by the area under the receiver operator characteristic curve (AUC) with 95% confidence intervals (CIs), the calculation of sensitivity and specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy. The diagnostic performance of the model was also validated through risk stratifications of patients with DR and without DR and patients with and without diabetic peripheral vascular disease.

p < 0.05 was considered statistically significant. We used SAS v. 9.4 (SAS Institute, Cary, North Carolina) and Python software v. 3.8.3 (Python Software Foundation, DE, USA) for analyses.

Results

Characteristics of the study populations

In total, 359 patients were included in this study, with 199 patients (55.43%) being diagnosed with DN. The mean age was 59.02 ± 10.81 years, with 135 being female (37.60%) and 224 (62.40%) being male. The median duration of diabetes was 120 months. Patients in the DN group were younger than those in the non-DN group (p < 0.05). Patients with DN were more likely to be male, had diabetic DR, and diabetic peripheral vascular disease. DBP, fasting blood glucose, HB, WBC, MCHC, UA, SCR, PT, and INR were higher in the DN group (p < 0.05). Compared with patients with DN, patients without DN were more likely to have higher LY, RDW, D-dimer, and LDL-C. The characteristics of the study participants are shown in Table 1.

Table 1 Basic characteristics of the study populations

Diagnostic factors selection for DN in T2DM

Diagnostic factors selection used the LASSO logistic regression analysis. Finally, 11 variables were retained as the diagnosis factors for DN in T2DM, including age, DBP, fasting blood glucose, insulin treatment, MCHC, PDW, UA, SCR, FIB, INR, and LDL-C. Figure 1a shows the optimal candidate (alpha) selection in the LASSO model using fivefold cross validation via minimum criteria. The AUC was plotted versus alpha. Dotted vertical lines were drawn at the optimal values by using the minimum criteria and the 1 standard error (SE) of the minimum criteria. LASSO coefficient profiles of the 42 candidates are depicted in Fig. 1b. A coefficient profile plot was produced against the alpha sequence. A vertical line was drawn at the value selected using fivefold cross validation, where optimal alpha resulted in 11 candidates with nonzero coefficients (alpha = 0.002). The importance of variables is described in Fig. 2.

Fig. 1
figure 1

Diagnosis factors selection for DN in T2DM; a optimal candidate (alpha) selection in the LASSO model; b LASSO coefficient profiles of the 42 candidates

Fig. 2
figure 2

Diagnostic factors for DN in T2DM

Establishment and performances of the diagnostic model for DN in T2DM

Age, DBP, fasting blood glucose, insulin treatment, MCHC, PDW, UA, SCR, FIB, INR, and LDL-C were used to build a diagnostic model for DN in T2DM. The parameters of XGBoost were set as follows: n_estimators = 80,learning_rate = 0.0004,subsample = 0.733,colsample_bytree = 0.8. We developed an online tool for the diagnostic model. The diagnostic model is uploaded to the GitHub (https://github.com/yuweixingsjz/model_xgboost/tree/main).

In terms of the performances of the diagnostic model for DN in T2DM, the cut off value, sensitivity, specificity, PPV, NPV, AUC, and accuracy was 0.603, 0.849 (95% CI 0.800–0.899), 0.969 (95% CI 0.942–0.996), 0.971 (95% CI 0.946–0.996), 0.838 (95% CI 0.785–0.891), 0.965 (95% CI 0.950–0.980), and 0.903 (95% CI 0.872–0.933), respectively. Table 2 and Fig. 3 show the performances of the diagnostic model for DN in T2DM.

Table 2 Performances of the diagnostic model for DN in T2DM
Fig. 3
figure 3

The AUC of the diagnostic model for DN in T2DM

Validation of the diagnostic performance of the model through stratifications

In patients with DR, the cut-off, sensitivity, specificity, PPV, NPV, AUC, and accuracy were 0.603, 0.765 (95% CI 0.681–0.849), 0.969 (95% CI 0.938–0.999), 0.949 (95% CI 0.901–0.998), 0.842 (95% CI: 0.783–0.902), 0.944 (95% CI 0.916–0.971), 0.880 (95% CI 0.838–0.922), respectively. As for patients without DR, the sensitivity, specificity, PPV, NPV, AUC, and accuracy were 0.921 (95% CI 0.868–0.973), 0.970 (95% CI 0.911–1.000), 0.989 (95% CI 0.969–1.000), 0.800 (95% CI 0.676–0.924), 0.987 (95% CI 0.971–1.000), and 0.933 (95% CI 0.890–0.975), with the cut-off value being 0.603. For patients with diabetic peripheral vascular disease, the AUC was 0.956 (95% CI 0.922–0.991). As for the patients without diabetic peripheral vascular disease, the AUC was 0.968 (95% CI 0.947–0.989). Table 2 and Fig. 4 show the performances of the diagnostic model for DN in patients with and without DR. The performances of the diagnostic model for DN in patients with and without diabetic peripheral vascular disease are described in Table 2 and Fig. 4.

Fig. 4
figure 4

The AUC of the diagnostic model stratified by populations; a patients with DR; b patients without DR; c patients with diabetic peripheral vascular disease; d patients without diabetic peripheral vascular disease

Discussion

Main findings of the present study

DN is one of the most common microvascular complications of T2DM and is a leading cause of CKD and ESRD worldwide [19]. It is critical to diagnose patients who are more sensitive to develop DN for better control of the process of disease. In this study, we established a model for diagnosing DN in patients with T2DM based on the clinical baseline information and biomarkers. The diagnostic model showed excellent performance. The study identified age, DBP, fasting blood glucose, insulin treatment, MCHC, PDW, UA, SCR, FIB, INR, and LDL-C as the diagnostic factors for DN in T2DM.

Comparison with other studies

At present, renal biopsy is the gold standard for the diagnosis of DN [20]. However, the biopsy is invasive and does not completely avoid complications, and it is not feasible in patients with relative or absolute contraindications, such as bleeding tendency, severe hypertension, or anemia [21]. Accumulating body of evidence has developed diagnostic models for DN in T2DM. A study by Liu et al. [22] formulated a diagnostic model of six clinical indicators (histories of diabetes mellitus, SBP, reduced level of glycated HB, hematuria, DR, and HB) which could provide a quantified probability of DN. A study by Miao et al. [23] included diagnosis age, location, body mass index, HDL-C, SCR, hypertension, dyslipidemia, retinopathy, diet control, and physical activity and developed and validated a model for predicting DN in Chinese people. Li et al. [24] designed and validated a scoring model for differential diagnosis of DN and non-diabetic renal diseases in T2DM patients, with the AUC of this study being 0.863. However, this study lacks adequate data so that some important parameters cannot be included in the analysis. In this study, based on the biomarkers data, we established a diagnostic model identification of DN in T2DM, which showed a better diagnostic performance compared with previous diagnostic models. The diagnostic model can be a non-invasive and simple way to diagnose DN in T2DM.

Our findings implied that age, DBP, fasting blood glucose, insulin treatment, MCHC, PDW, UA, SCR, FIB, INR, and LDL-C was associated with DN in T2DM patients. DBP and age are known as risk factors in the progression of DN [25]. In a study assessing related factors of DN, DBP and UA were found to be independent related factors for DN [26]. A study by Cai et al. [27] indicated there were significant differences in age of onset, DBP, LDL-C, and hypoglycemic treatments between T2DM patients with and without DN. As for the association between insulin treatment and DN, the Diabetes Control and Complications Trial demonstrated intensive insulin therapy reduces nephropathy in type 1 diabetes (T1DM) [28, 29]. While insulin therapy can control blood sugar, it has been established that even long-term intensive insulin therapies cannot prevent the development of secondary complications of diabetes, including DN [30]. The relationship between insulin therapy and DN in T2DM needs further verification. In terms of MCHC, Al-Rubeaan et al. [16] found that the subjects with microalbuminuria had demonstrated significantly increased MCHC compared with those without DN. As for the PDW, a meta-analysis demonstrated that compared with patients with T2DM and without DN, PDW was higher in patients with DN, and could be recommended as an inexpensive diagnostic biomarker for DN [17]. In Egyptian people with T2DM, PDW correlated with microvascular complications including DN s [31]. The association between DN and FIB has also been clarified. Tessari et al. [32] found that DN was associated with increased FIB production in patients with T2DM. A study in north-east Poland found in patients with DN manifesting with microalbuminuria or proteinuria, the concentration of FIB increased [33]. Further studies can further clarify the relationship between the above factors and DN in T2DM.

Implication and explanation of findings

The diagnosis model may provide the possibility for early diagnosis, treatment and delay the progression of DN in T2DM patients. The model for identifying DN incorporated routine test items which are available both in primary and secondary care settings. Our diagnostic model provides a decision support tool in the management of patients with T2DM and renal impairment who do not undergo biopsy due to various limitations.

Strengths and limitations

The strengths of this study are as follows: in this study, a non-invasive diagnostic model for DN in T2DM was constructed by clinically available indicators, which can assist in the diagnosis of the high-risk population with contraindications to renal biopsy. The diagnostic performance is better than other reported non-invasive diagnostic models. Diabetic patients are often complicated by multiple diseases. This study conducted a stratified study according to other diabetic complications, which further verify the robustness of the model. However, the present study has several limitations. Firstly, it is important to acknowledge that we used a sample from a hospital in China. Our findings can be generalized to Chinese patients, particularly in hospital settings, but may not be representative of larger populations in other settings or different cultural contexts. Secondly, the small sample size of the included studies might have led to inadequate statistical power. Thirdly, the observational nature of the cross-sectional study design makes residual confounding a potential issue. Fourthly, since the patients included in this study performed renal biopsies with a strong suspicion of DN, patient selection bias is a limitation. Finally, although this study validated the diagnostic performance of the model through cross-validation, the model still needs to be verified by a multi-center study.

Conclusion

Our study established a diagnostic model based on age, DBP, fasting blood glucose, insulin treatment, MCHC, PDW, UA, SCR, FIB, INR, and LDL-C to provide a simple and non-invasive method for diagnosing DN in clinical practice.