Introduction

Diabetes is one of the greatest health challenges of the twenty-first century worldwide [1]. In China, the prevalence of type 2 diabetes mellitus (T2DM) has been increasing with aging population, rapid urbanization, and changing lifestyle (obesity or overweight, physical inactivity, and high-calorie food consumption) [2, 3]. In 2010, a national cross-sectional survey of China found that 11.6% of Chinese adults had diabetes and in addition, 8.1% had the disease but undiagnosed [4].

Several large intervention studies have indicated that lifestyle modification or pharmacological intervention targeting population who are at high risk for diabetes can prevent or delay the occurrence of type 2 diabetes [5, 6]. It is important to find people with risk factors for type 2 diabetes using feasible and cost-effective approaches, and the diabetes risk score method has been frequently used [7].

In recent years, a set of risk score (RS) models had been developed and tested, such as the Finnish Risk Score [8], the Cambridge Risk Score [9], the Framingham diabetes mellitus risk score [10], the Australian Type 2 Diabetes Risk Assessment Tool [11], and among others. However, there is a lack of RS models constructed based on incidence data for Chinese populations, especially for those living in rural areas. Most existing diabetes RS models work well only in targeted populations and the direct application of these models that were mainly derived from populations of European descent, could underestimate such a risk in the Chinese population [12, 13]. Based on data from the Rural Deqing Cohort Study [14], we aimed to develop a T2DM RS model for rural Chinese population to identify rural residents who are at high risk of developing T2DM.

Materials and methods

Study design and participants

During 2006–2014, 29,229 eligible residents aged ≥18 years randomly cluster sampled from eight rural communities including sixty-five villages in Deqing, Zhejiang province, China, participated in the baseline survey of the Rural Deqing Cohort Study with a response rate of 83.5%. Our inclusion criteria were local residents (1) who were currently living in the selected rural communities; (2) who were aged 18 years or more; (3) who agreed to participate in study and signed the informed consents; (4) who were able to complete survey questionnaire and physical examination; and (5) who had no plan to work or move out of Deqing County. Information on demography, lifestyle (smoking, alcohol use, regular physical exercise, diet, and tea consumption), and physical conditions was collected using a questionnaire. Anthropometry data including height, weight, waist, and blood pressure were measured and a total of 5 mL venous blood samples were collected after an overnight fast of at least 8 h to test fasting plasma glucose (FPG) and hemoglobin. The Institutional Review Board of the Fudan University School of Public Health approved the study and all participants gave a written informed consent.

In total, 28,251 participants who were free of diabetes at baseline were followed up and new cases of type 2 diabetes were ascertained from the Deqing electronic health records in November, 2015. No electronic health records were available for 1899 (6.7%). We also conducted a sub-cohort study of 3043 individuals who participated in the baseline survey in 2006–2008 and followed them up through questionnaire interview and free physical examination in 2015, and of them 1205 (39.6%) were lost to follow-up or refused to participate.

Definition

Body mass index (BMI) was calculated from weight/height2 (kg/m2) and overweight was defined as BMI in the range from 25.0 to 29.9 kg/m2 and obesity as BMI ≥30 kg/m2. Family history (FH) was considered to be positive if one or more first-degree relatives had diabetes. Hypertension was defined as having systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg or a history of hypertension medication. T2DM was diagnosed as having FPG ≥7.0 mmol/L alone or self-reported diagnosis of diabetes or use of anti-diabetic medications. IFG (impaired fasting glucose) was defined as FPG range from 6.1 to 6.9 mmol/L.

Development and comparison of risk score models

Cox proportional hazards model was used to identify the predictors, and of 13 variables that were initially examined, only age, BMI, FH, diet preference, hypertension, and FPG were significantly associated to the incidence of T2DM and were given a score for each based on their β-coefficients from the Cox model (1 if β = 0.01–0.20; 2 if β = 0.21–0.80; 3 if β = 0.81–1.20; 4 if β = 1.21–2.20; and 5 if β > 2.20 [8]). The lowest category of each variable was given a score of zero. We constructed two practical risk score models based on the data from the total cohort and the sub-cohort: non-invasive and plus-FPG models. The non-invasive model included age, FH, diet preference, overweight/obesity, and hypertension, and FPG was added to the plus-FPG model. Meanwhile, we compared the performance of existing models derived from other populations, including the Finnish Risk Score, the Cambridge Risk Score, the Framingham Diabetes Mellitus Risk Score, and the Australian Type 2 Diabetes Risk Assessment Tool.

Data analysis

Data were double entered with Epidata 3.1 and statistical analysis was performed in SAS 9.2 (SAS Institute Inc. Cary, N.C.) and R3.3.2. Chi-square or Fisher test was used for categorical variables, and t test or ANOVA for continuous variables. Risk factors for T2DM were identified, and their crude hazard ratios (cHRs), adjusted hazard ratios (aHRs), and 95% confidence intervals (95% CIs) were estimated using Cox proportional hazards model. Receiver-operating characteristic (ROC) curves were plotted for RS and the area under curve (AUC) was estimated for the prediction models. In addition, bootstrapping was used to calculate C-statistics, bias-corrected C-statistics, sensitivity, specificity, and their 95% CIs based on 1000 replications. Hosmer–Lemeshow test was employed to test the goodness-of-fit for the models. All statistical tests were two-sided with a type I error of 0.05, and p values <0.05 were considered statistically significant.

Results

By 2015, totally 387 new T2DM cases were identified through the electronic health record system for the whole cohort, with an average 4.2 years of follow-up. The incidence of T2DM was estimated to be 3.3/1000 person-years. In the sub-cohort study with an average 8.7 years of follow-up, 191 new cases were diagnosed and the incidence was 7.7/1000 person-years.

The whole cohort included 12,510 (44.3%) men and 15,741 (55.7%) women. Men were older and were more likely to be a smoker, an alcohol user, or a regular exerciser, while women received higher education, ate more vegetable and fruit, and were more likely to have hypertension and IFG (Table 1).

Table 1 Baseline characteristic of whole cohort

In the Cox model, increased incidence of T2DM was significantly associated with 40 years of age or above (aHR 5.84, 95% CI 2.84–12.01, 4 points), overweight (aHR 2.15, 95% CI 1.70–2.72, 2 points) or obesity (aHR 4.92, 95% CI 3.11–7.79, 4 points), FH of T2DM (aHR 2.78, 95% CI 1.51–5.14, 3 points), diet preference (aHR 2.41, 95% CI 1.87–3.11, 3 points), hypertension (aHR 2.00, 95% CI 1.59–2.51, 2 points), and IFG (aHR 4.93, 95% CI 3.86–6.28, 4 points) at baseline (Table 2).

Table 2 Risk score based on β-coefficient of Cox regression analysis in whole cohort

In the non-invasive model, the RSs ranged from 0 to 20 and the optimal cut-off point for incident T2DM was 6. Of 28,251 participants, 11,297 (40.2%) had a score of ≥6, and of them 273 developed T2DM during the follow-up, accounting for 70.5% of all T2DM cases. The AUC of the model was 0.705, the sensitivity and specificity were 70.4 and 60.4%, the positive predictive value (PPV) and the negative predictive value (NPV) were 2.5 and 99.3%, and the positive likelihood ratio (+LR) and the negative likelihood ratio (−LR) were 1.78 and 0.49, respectively. In the plus-FPG model, the RSs ranged from 0 to 22 and the optimal cut-off point was 7. Totally 6954 (24.6%) subjects had a score of ≥7 and 244 of them developed T2DM during the follow-up, accounting for 63.0% of all T2DM cases. We also calculated C-statistic, and found that discrimination capacity of the model was improved after considering time effect. The AUC, C-statistic, sensitivity, and specificity are shown in Table 3 and Fig. 1.

Table 3 Performance of existing risk score model in whole cohort and sub-cohort
Fig. 1
figure 1

ROC of the non-invasive model in whole cohort (a); ROC of the plus-fasting plasma glucose model in whole cohort (b); ROC of the non-invasive model in sub-cohort (c); ROC of the plus-fasting plasma glucose model in sub-cohort (d)

Based on data from the sub-cohort population, the AUC and C-statistic were 0.638 and 0.662 for the non-invasive model, and at the cut-off point of 6, the sensitivity and specificity were 53.4 and 67.3%, for the plus-FPG model the AUC, C-statistic, sensitivity, and specificity were 0.667, 0.707, 59.2, and 65.7%, respectively.

We made a comparison between our RS and 12 existing classic RSs (Table 3). Generally, our RS model provided satisfactory results relative to the existing RS models in terms of AUC, C-statistic, sensitivity, and specificity.

Discussion

The diabetes risk score method is a cost-effective tool to identify individuals who are at high risk for T2DM in areas with limited resources such as rural China [1, 15]. Several diabetes risk scores have been reported for Chinese population, but few were created for rural population or based on prospective cohort data [16,17,18,19,20,21,22]. We developed the current diabetes risk score using data from a longitudinal cohort study of rural population [14], and to our knowledge, is the first one for rural Chinese adults based on large prospective cohort data. In the current study, we developed two practical risk score models, namely non-invasive and plus-FPG models, for rural Chinese adults. The discrimination capability was better for the plus-FPG model compared with the non-invasive model. However, because of the nature of convenience, non-invasiveness, and low cost, the latter is more practical, especially in primary public health settings [23, 24]. Both risk score models showed some discrepancies when different sources of data were used. The optimal risk score was 1-point lower for the sub-cohort population compared with the whole cohort population, and similar differences were reported in previous studies [25].

Several risk scores for T2DM incidence have been developed in both Caucasian populations and Asian populations [8,9,10,11, 23, 24, 26,27,28,29,30,31], and the most common predictors in these models were age, family history of diabetes, obesity, hypertension, and impaired fasting glucose, and our study showed similar results. Although we found that sex, physical activity, smoking, and alcohol use were also significantly associated with incident T2DM, AUC showed no significant improvement, and therefore, our final models did not include these factors.

Although various risk score models included similar risk factors, the discriminative capability was better for those of Caucasian populations [8,9,10,11] (AUC 0.78–0.86) than those of Asian populations [23, 24, 29] including ours (AUC 0.67–0.77). Compared to Caucasians, Asians are more susceptible to the development of T2DM [32] and have lower levels of insulin secretion [33].

We compared the performance of several existing risk scores for our study population and found that the AUCs were in the range from 0.56 to 0.73 and C-statistic was in the range from 0.64 to 0.75, which were substantially lower than those from the original study populations. Some existing risk score models showed a higher C-statistic than our study, but not their sensitivity or/and specificity. We used bootstrapping method to validate the performance measures. The bias-corrected C-statistics were similar, and the 95% CIs for the sensitivity and specificity were narrower. For our study population, our risk score models clearly demonstrated a better performance as compared with other exiting models developed based on data from non-Chinese populations, suggesting that diabetes risk score models should be designed and developed for specific target populations.

In our non-invasive risk score model, we used predictors that are easily measured and the data can be collected in general health care settings with integer point values of risk factors. A non-professional person can also estimate the risk for developing T2DM at home. One limitation was that the incident cases of T2DM were identified through local electronic health records system and there might be some underdiagnosis biases, indicated by the observations from the sub-cohort, which provided additional questionnaire and physical examination information. The local electronic health system currently only had information on annual community physical examinations, but no electronic medical records from hospitals. People who were diagnosed with T2DM in a hospital but did not participate the annual community physical examination would not be caught in the local electronic health system. Meanwhile, a low attendance rate of physical examination also affects the completeness of the health information system. There is a cautious note about the models’ external and internal validities. Although our study population had similar age and sex distributions as compared to general Chinese rural population, Deqing County is one of top 100 well-developed counties in China. This model needs to be further tested in different rural settings of the country. In addition, nearly 40% of subjects from the sub-cohort were lost to follow-up, which could bias the internal validation of our models to some degree.

Conclusions

In conclusion, our risk score model could be used as a simple, fast, cost-effective tool to identify individuals who are at high risk of type 2 diabetes for rural Chinese adults. Individuals with high score values should be encouraged to take a screening test for type 2 diabetes and to have a healthier lifestyle. It is important to develop a diabetes prevention and control program targeting rural residents to reduce their risk of T2DM.