Background

Colorectal cancer (CRC) remains a major public health challenge in the USA, with over 150,000 estimated new cases and 50,000 related deaths projected in 2023 alone [1], making it the second and third most common cancer in women and men separately [2]. The substantial associated health and economic burden underscores the urgent need for effective prevention strategies to curb CRC incidence and mortality.

Among modifiable lifestyle factors, diet has been recognized as a major contributor to CRC risk, accounting for more than 40% of CRC incidence and mortality [3]. Notably, as major dietary energy sources that directly impact blood glucose and insulin levels in all individuals, carbohydrates have been identified as potential factors linked to CRC risk [4,5,6]. However, existing studies primarily examining overall carbohydrate quantity have produced conflicting findings [7,8,9]. For instance, an Iranian study revealed the positive association between carbohydrate amount and CRC incidence [7], while other cohort analyses using low-carbohydrate diet scores (LCDs) to assess overall carbohydrate intake have yielded diverse conclusions. One study found that animal-rich LCDs increased colon cancer risk [8], but another one found that plant-rich LCDs may improve outcomes in CRC patients [9]. Given these discrepancies, the focus has shifted towards considering the quality, rather than just the quantity, of carbohydrates in relation to cancer risk. Recently, the carbohydrate quality index (CQI) has been developed as a comprehensive measure of carbohydrate quality, incorporating multiple factors like dietary fiber content and glycemic index [10]. A previous small case–control study in Iranians indicated that higher CQI and LCDs were associated with a reduced CRC risk [11]. However, it did not assess potential interrelationships between these scores or conduct location-specific CRC risk analyses. Moreover, crucial aspects, such as the correlation between CQI, LCDs, and CRC mortality, have not been adequately examined in previous studies. Overall, considering the diverse demographics and dietary habits in different populations, it is crucial to examine the potential correlations between CQI, LCDs, and the incidence and mortality of CRC in the US population.

In this study, we conducted a large-scale, prospective investigation with the aim of filling crucial knowledge gaps and unraveling the significance of carbohydrate quality and quantity, assessed using CQI and LCDs, respectively, in relation to CRC outcomes among Americans aged 55–74 years. Additionally, further analyses focusing on different anatomical subsites of CRC were also performed to determine whether these observed associations varied by the anatomical location of tumors. Our study may hold significant promise in guiding the development of effective preventive strategies to address the considerable health and economic burden posed by CRC in the USA.

Methods

Study design

This is a prospective study of participants in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, which is a large randomized clinical trial funded by the National Cancer Institute (NCI) between 1993 and 2001 [12]. Approximately 150,000 men and women aged 55–74 years were enrolled from 10 screening centers across the United States. Participants were randomly assigned to receive either routine medical care (control arm) or additional screening tests for prostate, lung, colorectal, and ovarian cancers (intervention arm) [13]. The trial protocol was approved by institutional review boards at the NCI and participating centers. The PLCO Screening Trial initiative was granted approval by the Institutional Review Board of the NCI as well as every screening center involved in the study, with explicit, informed, and written consent obtained from all participants. The details on the design of the PLCO trial, including power calculations and recruitment methods, have been extensively documented in prior publications [14, 15].

Data collection and covariates assessment

The demographic and lifestyle data of individuals were collected at baseline via self-administered questionnaires (Baseline Questionnaire, BQ) as part of the PLCO trial. The main data used in our study included age, sex, race, occupation, education level, smoking habits, pack-years of cigarettes, body mass index (BMI) at baseline, history of aspirin use, diabetes, colorectal diverticulitis or diverticulosis, colorectal polyp, colon comorbidities (that is, Gardner’s syndrome, ulcerative colitis, Crohn’s disease or familial polyposis), and family history of CRC. BMI was calculated as weight (kg) divided by height squared (m2). Dietary intake data were collected using a validated 137-item Food Frequency Questionnaire (FFQ) called the Dietary History Questionnaire (DHQ), administered 3 years after enrollment in the PLCO trial. The DHQ assessed the portion size, frequency, and types of foods and supplements consumed by participants over the past year. The validity of DHQ was demonstrated through comparison with a 24-h dietary recall study (that is, the Eating at America’s Table Study) [16]. In that study, DHQ performed better than other commonly used FFQs like the Block and Willett questionnaires in assessing absolute nutrient intake [16]. In the present study, physical activity was defined as weekly time spent in moderate to vigorous activity, collected via a supplemental questionnaire (SQX).

Population for analysis

At baseline, we applied several exclusion criteria to determine the final analytic sample: (1) Did not return the BQ (n = 4918); (2) Had an invalid DHQ, defined as lacking completion date, confirmed death before completing, having ≥ 8 missing responses, or extreme calorie intake values (top, or bottom 1%) (n = 38,462); (3) Personal history of any cancer before DHQ completion (n = 9684); (4) Diagnosed with CRC between randomization and DHQ completion (n = 114); (5) Diagnosed with colorectal carcinoid (n = 15). Ultimately, our analytic sample consisted of 101,694 individuals (49,452 males and 52,242 females) (Fig. 1).

Fig. 1
figure 1

The flow chart of identifying eligible subjects. PLCO, Prostate, Lung, Colorectal, and Ovarian; BQ, Baseline Questionnaire; DHQ, Diet History Questionnaire

Calculation of CQI and LCDs

In this study, the CQI and LCDs were calculated following the methods of Toledo et al. and Song et al., respectively, which was employed in the previous analyses [9, 10].

Specifically, CQI was calculated by summing the quintile scores of four equally-weighted criteria: dietary fiber intake (g/day, positively scored), glycemic index (GI, reversely scored), whole grain to total grain ratio (positively scored), and solid carbohydrate to total carbohydrate (solid + liquid carbohydrate) ratio (positively scored). For each criterion, participants were divided into quintiles and assigned scores from 1 to 5 based on the quintile value, except GI which was reversely scored. The total CQI score ranged from 4 to 20, with higher scores indicating better overall carbohydrate quality. Notably, the amount of liquid carbohydrates was calculated from the estimated consumption of sugar-sweetened beverages, and alcohol. Total grains are calculated by summing the intakes of whole grains, refined grains, and their products. In addition, the calculation methods related to GI have been described in detail in previous studies based on the PLCO trial [17, 18].

To calculate the LCDs, the intakes of carbohydrate, fat, and protein were first expressed as percentages of total energy consumption. These energy percentage values were then assigned into ranks from 0 to 10, with 11 equal-sized groups created ranging from the lowest percentage (rank 0) to the highest percentage (rank 10). However, the rank assignment direction differed by nutrient: for carbohydrates, lower percentages were assigned higher ranks (10 to 0), whereas for protein and fat, lower percentages were assigned lower ranks (0 to 10). By summing the nutrient ranks, with carbohydrate ranks reversely scored, the LCDs were generated on a scale of 0 to 30. Higher LCDs thus indicated lower carbohydrate but higher fat and protein intakes, representing more extreme low-carbohydrate dietary patterns.

Full details on the CQI and LCDs calculations and compositions are provided in Additional file 1: Table S1.

Ascertainment of outcome events

In the PLCO trial, CRC cases were primarily ascertained through annual study update forms mailed to all surviving participants, which requested information on any new cancer diagnoses. Reported CRC cases were verified through medical record review using a standardized form, with study physicians confirming diagnoses in a blinded fashion. Vital status of participants was also tracked via the annual forms, with repeated attempts to contact non-responders. Additional mortality ascertainment involved routine checks of the National Death Index and death certificates using ICD-9 codes for causes of death.

CRC cases in the current study were categorized by anatomic subsites using International Classification of Disease (ICD)-O2 codes: proximal colon (C180-C185), distal colon (C186-C187), and rectum (C199-C209). When analyzing colorectum subsites, CRC coded as C188, C189, C212 and C218 were censored. It should be highlighted that the study’s main focus was on CRC incidence, while the secondary measure considered was CRC-related mortality.

Statistical analysis

In this analysis, some covariate data were missing to varying degrees. For categorical variables with < 5% missing data, including education level, smoking status, history of aspirin uses, diabetes, colorectal diverticulitis or diverticulosis, colorectal polyp, colon comorbidities, and family history of CRC, missing values were imputed with mode value. For continuous variables with < 5% missing data, including BMI, and pack-years of cigarettes, median imputation was utilized. Multiple imputation methods were further applied to the physical activity level variable which had about 25% missing data [19]. Detailed information on the types of variables imputed and proportions missing were provided in Additional file 1: Table S2.

In the present study, time-to-CRC-event (diagnosis or related death) was defined as days from DHQ completion until CRC diagnosis or confirmation of CRC-related death. Hence, for primary outcome events, follow-up length was measured from the time of DHQ completion to the time of CRC diagnosis, death, lost, or December 31, 2009 (the end of cancer incidence follow-up), whichever happened first. For secondary outcome events, the end of mortality follow-up was 2018, which was detailed on the PLCO website (https://cdas.cancer.gov/learn/plco/early-qx/) (Fig. 2). Cox proportional hazards regression models were constructed to estimate the hazard ratios (HRs) and 95% confidence intervals (CIs) for the associations of the CQI and LCDs with outcome events, with the follow-up period as the time metric. The two mentioned scores were analyzed as continuous variables (HRs calculated per 1-standard deviation increment) and as categorical variables (in quartiles, with the first quartile being the referent group) in the Cox models. To assess potential linear trends, separate continuous variables were generated using the median CQI and LCDs within each quartile. The P value represents the significance of linear trends. Potential confounding variables were selected as established CRC risk factors or based on the clinical expertise of the investigators [20]. To mitigate the potential impact of confounding, these variables were incorporated in the Cox regression models of this study. Model 1 was adjusted for demographic characteristics, which included sex, age, race, and education level. Model 2 was further adjusted with lifestyle and clinical factors (BMI, physical activity level, smoking status, pack-years of cigarettes, alcohol consumption, history of colorectal diverticulitis or diverticulosis, colon comorbidities, colorectal polyp, aspirin use, diabetes, and family history of CRC), and total energy intake from diet. Considering the potential interaction between CQI and LCDs, Model 3 for each score adjusted for the covariates included in Model 2 plus the other score (CQI or LCDs). Restricted cubic spline models with knots at the 10th, 50th, and 90th percentiles were utilized to depict the trends of CRC incidence and mortality across the full range of the two scores [21]. The median value of the two scores was set as the reference, separately. Nonlinearity was tested by examining the null hypothesis that the regression coefficient for the second spline term equaled zero. In addition, the same analyses were conducted on anatomical subsites of CRC. The proportional hazards assumption was tested using Schoenfeld residuals and no violation was found [22].

Fig. 2
figure 2

The timeline and follow-up scheme of our study. Notably, in our study, the baseline point was set at the date of diet history questionnaire completion

Prespecified subgroup analyses were performed to explore potential effect modification of the associations between CQI and LCDs and CRC incidence by several key factors. Subgroups were defined by age, sex, smoking status, BMI, history of diabetes, aspirin use, dietary energy intake, and the score (CQI or LCDs). To avoid misleading subgroup effects, P-values for interaction were determined by comparing models with and without interaction terms prior to subgroup analyses. Moreover, P-values for trend across quartiles of the two scores were calculated within each subgroup using previously described methods. The purpose of these analyses was to assess the consistency and generalizability of the associations between the two scores and CRC outcomes in major population segments. Given the small number of CRC-related deaths in this study, we did not stratify analyses of CRC mortality in subgroup analyses.

Several sensitivity analyses were performed to enhance the robustness of the findings: (1) individuals with extreme energy intake (> 4000 kcal/day or < 500 kcal/day) or BMI (top or bottom 1%) were excluded; (2) outcome events within the initial 1 or 2 years of follow-up were excluded to assess potential reverse causation effects; (3) individuals with a history of colon comorbidities, colorectal polyps, or a family history of CRC were excluded, considering they are in high risk of CRC [20, 23]; (4) further adjusting for the carbohydrate intake (% E) directly instead of LCDs in the Model 3 of CQI analyses to determine whether this observed association was influenced by the amount of carbohydrate intake; (5) further adjusting for several dietary factors, including the energy-adjusted consumption of dietary calcium, calcium from supplements, energy-adjusted average daily red meat consumption, and total folate (combining dietary folate and folate from supplements). All statistical analyses were carried out using R software version 4.2.2, with two-tailed P < 0.05 as the level of statistical significance.

Results

Participant baseline features

In this study, the mean (standard deviation) was 12.0 (3.1) points for CQI, and 15.0 (7.1) points for LCDs. The two scores were negatively correlated (Pearson’s R =  − 0.062, P < 0.001). The baseline characteristics according to quartiles of CQI and LCDs are presented in Table 1. Participants in the highest CQI quartile had healthier lifestyles including more physical activity, lower cigarette pack-years, BMI, and alcohol consumption compared to those in the lowest quartile, whereas the highest LCDs quartile displayed an inverse pattern of less healthy lifestyles relative to the lowest quartile.

Table 1 Baseline characteristics of study population according to overall CQI and LCDs

During a mean of 8.81 years of follow-up, 1085 incident CRC cases were documented, among which 311 died from CRC over a longer follow-up period (15.07 years). These cases included 640 proximal colon cancers (181 deaths), 224 distal colon cancers (71 deaths), 199 rectal cancers (54 deaths), and 22 of unknown anatomical location (5 deaths).

Association between CQI and CRC outcome events

The multi-model Cox regression analysis results of CQI and the incidence and mortality of CRC, including its subsites, were presented in Table 2 and 3, respectively. In comparison with participants in the lowest CQI quartile, those in the highest quartile had a significantly reduced incidence of CRC after adjusting for potential confounders (Model 3: HR Quartile 4 vs. Quartile 1: 0.80; 95% CI: 0.67, 0.96; P = 0.012 for trend). A similar result was observed in the mortality of CRC (Model 3: HR Quartile 4 vs. Quartile 1: 0.61; 95% CI: 0.44, 0.86; P = 0.004 for trend). Analyses modeling CQI as a continuous variable revealed significant inverse associations of higher CQI scores with CRC incidence (HR per SD increment: 0.93; 95% CI: 0.87, 0.99) and mortality (HR per SD increment: 0.84; 95% CI: 0.75, 0.95) in the Model 3. Restricted cubic spline regression models demonstrated linear dose–response relationships, whereby higher CQI scores were associated with lower risks of CRC incidence and mortality (all P-values for nonlinearity > 0.05; Fig. 3).

Table 2 Association between CQI and the CRC incidence according to main anatomic location
Table 3 Association between CQI and the CRC mortality according to main anatomic location
Fig. 3
figure 3

Nonlinear Dose–response analysis on the association of CQI with the risk of both colorectal cancer incidence and mortality. Hazard ratios were adjusted for age, sex, race, education levels, family history of colorectal cancer, history of colon comorbidities, history of diverticulitis or diverticulosis, history of colorectal polyp, history of diabetes, history of aspirin use, total energy intake, body mass index at baseline, smoking status, pack-years of cigarettes, alcohol consumption, physical activity level, and LCDs

In subsite analyses using multivariable Model 3, higher CQI scores were associated with decreased incidence of distal colon cancer (HR Quartile 4 vs. Quartile 1: 0.65; 95% CI: 0.43, 0.97; P = 0.024 for trend) and rectum cancer (HR Quartile 4 vs. Quartile 1: 0.58; 95% CI: 0.38, 0.91; P = 0.027 for trend), but not proximal colon cancer. Furthermore, this inverse association was also observed between CQI and rectal cancer mortality (HR Quartile 4 vs. Quartile 1: 0.27; 95% CI: 0.10, 0.73; P = 0.006 for trend) but not for other subsites.

In subgroup analyses stratified by major demographic and lifestyle factors, the inverse associations of CQI with CRC incidence were consistent and not modified by age, sex, smoking, BMI, aspirin use, diabetes history, energy intake, or LCDs (all P-interaction > 0.05; Additional file 1: Table S3). Sensitivity analyses that excluded some specific individuals or adjusted for additional covariates showed robust correlations between higher CQI and reduced incidence and mortality of CRC Additional file 1: Table S4–5).

Association between LCDs and CRC outcome events

In multivariable Cox regression analyses, no significant associations were observed between LCDs and risks of overall CRC incidence (Model 3: HR Quartile 4 vs. Quartile 1: 0.92; 95% CI: 0.77, 1.10; P = 0.261 for trend; HR per SD increment: 0.96; 95% CI: 0.90, 1.02) or mortality (Model 3: HR Quartile 4 vs. Quartile 1: 1.02; 95% CI: 0.74, 1.42; P = 0.982 for trend; HR per SD increment: 0.98; 95% CI: 0.87, 1.10) when comparing extreme quartiles or modeling LCDs continuously (Table 4 and Additional file 1: Table S6). Similarly, in subsite analyses, LCDs were not significantly associated with the incidence or mortality of proximal colon, distal colon, or rectum cancers (all P > 0.05 for trend).

Table 4 Association between LCDs and the CRC incidence according to main anatomic location

Subgroup analyses were consistent with the overall null findings between LCDs and CRC incidence (Additional file 1: Table S7). Additionally, sensitivity analyses showed that the lack of significant associations between CQI and CRC incidence or mortality remained unchanged (data not shown).

Discussion

In this prospective cohort study of US adults, we found that higher carbohydrate quality as assessed by CQI was significantly associated with reduced CRC incidence and mortality. These inverse associations remained robust in the subgroup and sensitivity analyses. In subsite-specific analyses, higher CQI was associated with a 35% lower incidence of distal colon cancer and a 42% lower incidence of rectal cancer. Higher CQI was also associated with a 73% lower risk of dying from rectum cancer. In contrast, lower carbohydrate quantity as measured by LCDs showed no significant correlations with CRC outcomes, suggesting reduced carbohydrate quantity alone may not lower CRC burden in the American population. Overall, our findings indicate carbohydrate quality instead of quantity may be an important protective factor against CRC, particularly for distal colon and rectal cancers.

The CQI emphasizes diets high in dietary fiber; low in glycemic index; with a higher ratio of solid to total carbohydrates, indicating restricted alcohol and sugar-sweetened beverages; and a higher ratio of whole to total grains [10]. These interconnected diet quality factors may contribute to reduced CRC burden through several mechanisms. Specifically, the high intakes of dietary fiber increase stool bulk and accelerate colonic transit, reducing mucosal contact time with carcinogens and tumor promoters [24]. Colonic fermentation of fiber also produces short-chain fatty acids like butyrate that confer anti-inflammatory and anticarcinogenic effects [25]. Furthermore, the lower glycemic index and restricted sugar-sweetened beverages mitigate hyperinsulinemia and obesity, insulin resistance, and chronic inflammation, which can reduce CRC risk [26,27,28]. Restricting alcohol may suppress acetaldehyde production by colonic bacteria, thus lowering DNA damage, resisting epigenetic dysregulation, and inhibiting colorectal tumorigenesis [29]. Higher whole grain consumption provides abundant antioxidants, vitamins, minerals, and phytochemicals that counter oxidative stress and inflammation driving neoplastic changes [30]. In summary, the synergistic actions of high-quality carbohydrates on critical risk factors and pathways, from colonic milieu to systemic metabolism, may contribute to their observed strong inverse associations with CRC.

In this study, the protective associations between higher CQI and reduced CRC incidence and mortality were primarily observed for distal colon and rectum cancer rather than proximal colon cancer. This aligns with previous evidence suggesting stronger inverse diet-cancer relationships in the distal colon versus rectum than in the proximal colon regions [31, 32]. Compared to the proximal colon, the distal colon and particularly the rectum have greater carcinogen exposure due to prolonged transit times and fecal retention [31, 33]. The higher dietary fiber emphasized by CQI may confer particular benefits in the distal colorectum by accelerating transit, reducing genotoxic contact, enhancing butyrate production [25], and suppressing chronic inflammation [27]. The specific benefits of carbohydrate quality for distal colon and rectal cancer warrant further research on potential diet-microbiome-metabolite interactions along the colorectum. Elucidating such regional specificity of diet-cancer associations can inform targeted preventive strategies.

To the best of our knowledge, only one small case–control study from Iran (71 CRC cases and 142 controls) has reported associations between higher CQI and lower CRC risk (T3-OR = 0.15; 95% CI: 0.06–0.39), as well as an inverse link between LCDs and CRC incidence (T3-OR = 0.28; 95% CI: 0.10–0.82) [11]. Our results are partly consistent with this case–control study in terms of the inverse association between CRC incidence and CQI, but differ on the LCDs finding. However, the limited sample size and retrospective design limit the ability to draw definitive conclusions from this study. Notably, large cohort studies have found more nuanced relationships between LCDs and CRC outcomes. One cohort study from Singapore reported higher animal-based LCDs were associated with increased CRC risk [8], while another from the American cohort found plant-based LCDs linked to the decreased CRC-related mortality [9]. Importantly, LCDs based solely on reduced carbohydrate quantity did not show significant associations with CRC incidence and related mortality in both of the above cohort analyses [8, 9]. This aligns with our finding of LCDs, indicating overall carbohydrate restriction without consideration of food sources may not influence CRC risk.

Interestingly, our subgroup findings indicate that the protective association between higher CQI and reduced CRC incidence was only evident among individuals with lower LCDs (HR Quartile 4 vs. Quartile 1: 0.69; 95% CI: 0.54, 0.87; P = 0.001 for trend), but not those with higher LCDs (HR Quartile 4 vs. Quartile 1: 0.91; 95% CI: 0.70, 1.19; P = 0.484 for trend), although statistical tests for interaction did not meet significance thresholds (P-interactions > 0.05). This novel result highlights that simply restricting overall carbohydrate amount may attenuate the protective effects of high-quality carbohydrate diets emphasized by CQI. In contrast, higher LCDs were not associated with decreased CRC risk across strata of higher or lower CQI. Overall, these data suggest that maintaining higher carbohydrate quality as reflected by CQI may be relevant for lowering CRC risk instead of restrictive carbohydrate intake, among individuals with more rather than less carbohydrate consumption. Taken together, our results provide novel preliminary evidence on the interplay between carbohydrate quality and quantity in shaping CRC susceptibility. Further research is warranted to clarify the optimal balance between carbohydrate amount versus quality for CRC prevention.

This study possesses several notable strengths, setting it apart from previous research. Firstly, it stands as the first large-scale, prospective investigation to concurrently explore the correlations between CQI and LCDs with both CRC incidence and mortality within a US cohort. This novel approach offers valuable insights into the role of carbohydrate quality and quantity in influencing CRC outcomes. Secondly, the extensive follow-up period and the inclusion of a large sample size significantly bolstered the statistical power of our study and increased the generalizability of the findings to similar populations. Thirdly, to minimize any potential biases, we conducted meticulous adjustments for an array of confounding factors in our analyses. Moreover, we performed a special subgroup analysis that yielded unique preliminary evidence regarding the interaction between CQI and LCDs in influencing CRC incidence. This exploration suggests that adhering to a higher CQI may not confer a significant benefit in reducing the risk of CRC in individuals with higher LCDs (i.e., those with lower carbohydrate intake). This observation raises intriguing questions about the potential complex interplay between carbohydrate quality and quantity in relation to CRC outcomes, warranting further investigation. Additionally, we performed sensitivity analyses to test the robustness of our results across various assumptions, reinforcing the reliability of our findings.

However, some limitations should be acknowledged. Firstly, our assessment of dietary intake was conducted only at baseline using DHQ, without capturing potential changes over time. While baseline diet assessments reasonably reflect habitual long-term intake patterns based on nutritional tenets [34]. Hence, the single DHQ measure provided valid representations of participants' customary diets before and during the study. Secondly, the possibility of residual confounding from unmeasured factors cannot be entirely excluded, as is the case with most observational studies. Thirdly, given the study's focus on older adults in the US, caution should be exercised when generalizing the results to other age groups or different countries, as dietary and lifestyle factors may differ. Lastly, as with any observational design, causal inferences concerning the identified diet-cancer associations must be interpreted with caution, warranting the need for future interventional studies to establish causality definitively.

Conclusions

This uniquely comprehensive investigation in older Americans provides strong evidence that emphasizing carbohydrate quality over quantity may confer protection against CRC, particularly for distal colon and rectal tumors. These thought-provoking findings lay the groundwork for additional research to further elucidate relationships between carbohydrate characteristics and regional CRC susceptibility. Besides, future studies should be conducted to explore this association in other populations to verify the generalizability of these findings.