Introduction

Almost 40% of all cancer cases can be attributable to modifiable risk factors [1], and a poor quality diet is linked to an increased risk of several cancer types [2]. Nutrients are often studied as active components in the diet-cancer relation, but strong associations can be difficult to detect because a food product or a nutrient constitutes only a small portion of the overall diet [3]. Hence, dietary patterns are proposed as an alternative approach to study diet-disease relations [4].

A dietary pattern is a complex collection of nutrients and food products, including the quantities and varieties of foods that comprise different diets [5, 6]. The World Cancer Research Fund (WCRF) recently concluded that there is some evidence that a healthy dietary pattern consisting of fruits, vegetables, and fish decreases cancer risk in the mouth, pharynx, and larynx [7], whereas a Western-type dietary pattern consisting of meat, potatoes, refined grains, and sugars increases the risk of obesity, which is an established risk factor for several cancers [7]. Previous research in Asia, Europe, and the United States has identified associations between dietary patterns and risk of overall or site-specific cancers [4]; however, insufficient evidence exists regarding the etiologic associations between dietary patterns with combined and site-specific cancer incidence in Canada.

Alberta’s Tomorrow Project (ATP) is a prospective cohort study designed to investigate the etiology of cancer and other chronic diseases. Information on lifestyle, behavioral, and environmental exposures was collected using surveys among participants in Alberta, Canada between 2001 and 2015 [8]. Using data from this cohort, we investigated the associations between a posteriori-derived dietary patterns using principal component analysis (PCA) and reduced rank regression (RRR) (a combination of a posteriori and a priori approach), with combined and site-specific cancer incidence in ATP participants.

Methods

Study population

In total, 55,530 Albertans aged 35–69 years were enrolled in ATP, for which detailed information about participant recruitment, enrollment, and data collection methods for Phase I have been previously described [8]. The current analysis included participants who completed the Canadian Diet History Questionnaire between 2000 and 2008 (n = 26,462). Ethics approval was obtained from the Health Research Ethics Board of Alberta – Cancer Committee (HREBA.CC-16-0495). Informed conset was obtained from all participants.

Dietary intake assessment

The Canadian Diet History Questionnaire is a validated 124-item food frequency questionnaire of foods, beverages, and dietary supplements that assesses dietary intake over the preceding 12 months and is a modified version of the US National Cancer Institute’s Diet History Questionnaire [9]. Modifications were made to the National Cancer Institute’s Diet History Questionnaire to reflect food availability, brand names, nutrient composition, and food fortification in Canada [10]. Data on energy intake, 66 nutrients, and 30 food groups were available using a method described elsewhere [11, 12]. A previous validation study by Csizmadi et al. [13] compared the validity of the Canadian Diet History Questionnaire to 24 h food recalls collected by the Canadian Community Health Survey—Cycle 2.2. The study reported that the Canadian Diet History Questionnaire was able to adequately capture macro and micronutrient intakes collected from the 24 h recalls with ranges from 79–90% for carbohydrate intake (a proxy for fructose intake), total fat intake (a proxy for discretionary fat intake), and vitamin D and fiber.

Cancer incidence rates

Data on primary cancers were obtained through a linkage with the Alberta Cancer Registry using participants’ personal health numbers up to June 2018. The Alberta Cancer Registry is a population-based cancer registry that has over 95% case ascertainment. Site-specific cancers in this study were included if there was previous epidemiologic evidence and biologic plausibility for an association between dietary intake and cancer risk [7] and if at least 100 incident cases had occurred in the participants who had completed the Canadian Diet History Questionnaire at baseline. The threshold of 100 cancer cases for inclusion was used to achieve reasonable statistical power for the analyses as used in previous studies from this cohort [14]. There were sufficient cases accrued in this cohort to include separate analyses for breast, lung, colon, endometrium, and prostate cancers.

Covariates

Information on covariates was obtained from the baseline Health and Lifestyle Questionnaire. Covariates were first added to the final multivariable models if they were known confounders in the association between dietary intake and cancer risk. These included age (years), sex (male/female for non-sex specific cancer sites), body mass index (BMI) (kg/m2) and total energy intake (kCal). Furthermore, for lung cancer incidence, smoking was also forced into the model because of its known association with this cancer type. For the remaining covariates (smoking status: daily/occasional/former/never; physical activity: total MET hours/week; marital status: married or living with someone/divorced, separated or widowed/single, or never married), we assessed whether these variables changed the beta coefficient for the association between dietary patterns and cancer incidence by at least 10% when added to the model (ref). This led to the inclusion of the following additional covariates for each cancer site: smoking status for combined, breast, colon, endometrial, and prostate cancers; physical activity for combined, breast, lung, colon, endometrial, and prostate cancers; marital status for endometrial cancer.

Follow-up time

Follow-up time was defined as the difference in years between age of cancer diagnosis (based on the exact age when primary cancer was diagnosed), or the end of follow-up for the present analyses (based on exact age at the time of data linkage with the Alberta Cancer Registry in June 2018) and the age of entry into the study (based on exact age at Canadian Diet History Questionnaire completion). The contributed person-time per participant is time point of completion of Canadian Diet History Questionnaire to diagnosis of cancer, or to the end of follow-up in June 2018, whichever came first.

Dietary patterns

PCA is a data-driven approach for variable reduction that can be used to assess overall dietary patterns in contrast to large list of specific food variables [15]. In this application of PCA as a dimension-reduction technique, we seek to examine the impact of patterns within a regular diet on cancer risk, rather than a reductionist approach when examining single foods or nutrients. The output from PCA generates a smaller number of variables which are the principal components [15]. The principal components in this study are then examined as dietary patterns. Hence, these dietary patterns explain a considerable amount of the variation of specific food item intake within this population as possible. This approach is data driven in that it ignores any knowledge on plausible biologic associations with cancer incidence [15]. For this method, dietary patterns were derived using the 30 food groups generated with the Diet*Calc® software. Components (dietary patterns) were selected based on the eigenvalues (below or above 0.35) [16], scree plots (visual break), and their interpretability. Food groups with coefficients ≥0.35 were used to describe each component (dietary pattern) and provide the dietary pattern interpretation. The coefficients represent covariance between the foods and the overall dietary pattern. Foods with positive coefficients were positively associated with the dietary pattern, whereas the opposite was true for negative coefficients.

Orthogonal Varimax rotation was applied to reduce correlation and increase interpretability between the components (dietary patterns) [17]. Subsequently, for each participant, dietary pattern scores were estimated for every identified dietary pattern. Each specific dietary pattern has a score for each food group. For each individual, dietary pattern scores were calculated by multiplying the score for each of the 30 food groups by the intake of the food groups. These dietary pattern scores were then standardized and expressed as z-scores. The individual dietary pattern scores represent the alignment between that individual’s diet and the derived dietary patterns.

RRR was the second statistical approach used to derive dietary patterns. For this method, disease-specific response variables (nutrients, combinations of nutrients, and biomarkers) are used to establish dietary patterns that explain as much variation in the population concerning that nutrient as possible [15, 18]. Response variables are nutrients selected based on scientific evidence [7] and biologic plausibility. For this study, we chose dietary fiber, vitamin D, fructose, and discretionary fats because: (a) current evidence on these nutrients in the WCRF report [7] (b) they are present in different types of foods and dietary patterns; (c) data were readily available; (d) the Food Frequency Questionnaire had been validated for these nutrients and validity was high and; (e) they are known to be associated with several specific cancers (e.g., colorectal and pancreatic cancers) [19,20,21,22,23]. Although these nutrients are linked to other site-specific cancers, evidence has been inconsistent and unsubstantial [19, 20, 24,25,26,27,28].

In RRR, dietary patterns are derived based on the variance in preselected nutrients (response variables), whereas in PCA, dietary patterns are derived based on the variance in food items and groups (predictor variables). The partial least squares procedure was used to perform the RRR and is explained in more detail elsewhere [15]. Briefly, the number of selected response variables (nutrients) dictates the number of derived dietary patterns, which was four in our study. RRR uses a covariance matrix of responses. Food groups with coefficients below or above |0.2| were used to describe the dietary pattern for each of the selected nutrients. These dietary pattern scores were standardized and expressed as z-scores.

Statistical analysis

Baseline characteristics were estimated as means ± standard deviations (continuous variables) or as counts and percentages (categorical variables). Cox proportional hazard regression models were used to estimate multivariable-adjusted models for the association between each dietary pattern score derived by PCA and RRR with combined and site-specific cancer incidence. The multivariable model included adjustments as defined above, and an additional model accounting for reverse causality was created by excluding cancer cases diagnosed <2 years after Canadian Diet History Questionnaire completion (latency-adjusted multivariable model).

All statistical analyses were performed with SAS 9.4 (SAS Institute Inc., Cary, NC, USA).

Results

This analysis included 26,462 participants (10,109 males and 16,353 females), of which 2,721 (10.3%) developed cancer (1177 males and 1544 females) during the follow-up period. Mean follow-up time was 13.3 ± 3.3 years, which equaled 204,364 person-years (Table 1). The average age at baseline was 50.8 ± 9.2 years. The majority of participants had a BMI >25 kg/m2 (66%), were married (79%), highly educated (72%), employed (73%), former or never smokers (82%), and had no co-morbidities (62%).

Table 1 Baseline characteristics of Alberta’s Tomorrow Project study participants (n = 26,242) by sex and case status, 2001–2008.

Principal component analysis

Three dietary patterns were identified using PCA analysis: “Western”, “Prudent”, and “sugar, fruits, and dairy” (Supplemental Tables 1, 2). A “Western” pattern was characterized by a high intake of grain servings, especially non-whole grains, vegetables, white potatoes, cheese, meats from lamb, pork, beef and luncheon meats (red and processed meats), discretionary fats, and teaspoons of added sugar. The “Prudent” pattern was characterized by a high intake of vegetables, fruits, and lean meat from fish and other sea food. The “sugar, fruits, and dairy” pattern was comprised of grain servings, especially whole grains, fruits, dairy, and teaspoons of added sugar. These three dietary patterns explained 42.4% of the variance in dietary intake.

Differences in standardized dietary pattern scores in males between cases and non-cases were mostly observed for a “sugar, fruit, and dairy” pattern. Female cases had lower “Prudent” pattern scores in cases compared to non-cases. Pattern scores for “Western” were similar across subgroups, although females had lower scores than males (Table 1).

Reduced rank regression

Four dietary patterns were derived by RRR based on the four selected nutrients (Supplemental Table 3). A “dietary fiber” pattern was characterized by intake of grain servings, vegetables, and fruits and explained 52.4% of the response variation. The “vitamin D” pattern had high factor loadings for dairy and fish and other seafood and explained 19.4% of response variation. A “fructose” pattern was characterized by fruits and teaspoons of added sugar intake and explained 8.8% of response variation. Lastly, a “discretionary fat” pattern was characterized by intake of excess solid fats present within the “Milk” and “Meat and Beans” categories (e.g., whole vs. skim milk) and explained 7.9% of response variation. In total, these patterns explained 88.3 and 48.9% of the variation in response and predictor variables, respectively.

Female cases had higher standardized coefficient scores for a “fructose” pattern and lower scores for a “dietary fiber pattern” compared to non-cases (Table 1). Pattern scores were similar across groups for the “vitamin D pattern” and “discretionary fat” pattern.

Combined cancer cases

The “Prudent” pattern was associated with reduced incidence of combined cancers in both the multivariable (HR = 0.83, 95% CI = 0.74–0.93) and latency multivariable-adjusted models (HR = 0.82, 95% CI = 0.73–0.92), when comparing the highest vs. lowest quartiles with evidence of statistically significant trends (p = 0.002 for both models). No other consistent trends were found using PCA derived patterns (Table 2).

Table 2 Cox regression hazard ratios of cancer incidence for dietary patterns derived by PCA within Alberta’s Tomorrow Project (n = 26,242), 2001–2008.

For the RRR derived patterns, the “dietary fiber” and “discretionary fat” patterns were associated with reduced risk of combined cancer incidence, with significant trends observed for both patterns in the latency multivariable-adjusted model (pdietary fiber = 0.003; pdiscretionary fatrs = 0.008). The “fructose” pattern was associated with increased cancer risk in the latency multivariable-adjusted model. Those in the highest quartile had a 14% increased risk compared to the lowest quartile (HR = 1.14, 95% CI = 1.02–1.27) and a significant trend was observed (p = 0.01). (Table 3)

Table 3 Cox regression hazard ratios of cancer incidence for dietary patterns derived by RRR within Alberta’s Tomorrow Project (n = 26,242), 2001–2008.

Site-specific cancers

For lung cancer, the “sugar, fruit, and dairy” pattern was associated with reduced risk in all PCA models. In the latency multivariable-adjusted model, a statistically significant trend was observed (p = 0.01) and individuals in the third quartile had a reduced risk compared to those in the first quartile (HR = 0.61, 95% CI = 0.42–0.89) (Table 2). Associations with lung cancer were also found for all dietary patterns derived using RRR (Table 3). The most consistent association was with the “fructose” pattern, where individuals in higher quartiles were at increased risk of lung cancer compared to those in the first quartile. In the latency multivariable-adjusted model, the fourth quartile had a 53% greater risk of lung cancer compared to those in the first quartile (HR = 1.53, 95% CI = 1.07–2.20).

The “prudent” pattern was consistently associated with a reduced risk of colon cancer in all models, with a significant trend found in the latency multivariable-adjusted model (p = 0.037) (Table 2). Individuals in the highest quartile of this pattern had a 37% reduced risk of colorectal cancer compared to those in the lowest quartile (HR = 0.63, 95% CI = 0.42–0.96). The “dietary fiber” and “discretionary fats” patterns were associated with reduced colon cancer in the RRR analysis (Table 3). Both patterns had significant trends for the latency multivariable-adjusted model, with reductions in risk near 50% when comparing the fourth to the first quartile (HRdietary fiber = 0.54, 95% CI = 0.31–0.92; HRdiscretionary fats = 0.52, 95% CI = 0.33–0.82).

Reduced risk of prostate cancer was found for individuals with a “prudent” pattern, with significant trends found for both the multivariable-adjusted and latency multivariable-adjusted models (Table 2). A statistically significant trend was also observed for the “dietary fiber” pattern in the multivariable and latency multivariable-adjusted models (p = 0.03 and p = 0.01, respectively) (Table 3). In the latter model, individuals in the fourth quartile were at significantly reduced risk of prostate cancer compared to individuals in the first quartile of this pattern (HR = 0.65, 95% CI = 0.42–0.99). No statistically significant findings were observed for breast or endometrial cancer.

A summary of the dietary pattern components and the associated cancer risks can be found in Table 4.

Table 4 Descriptors of the main components in each dietary pattern for both PCA and RRR methods, grouped by similarity of dietary pattern (i.e., the prudent and dietary fiber patterns are both partially characterized by intake of vegetables and fruits). Shaded boxes indicate the dietary patterns confers a statistically significant reduced or increased risk of cancers at the p < 0.05.

Discussion

This analysis, from a prospective cohort study of ~25,000 adults from Alberta, Canada, examined the association between dietary patterns derived using both PCA and RRR, and cancer incidence. Using these methods, we found that a “dietary fiber” pattern decreased the risk of combined cancers, as well as lung, colon, and prostate cancers. A “fructose” pattern was associated with increased risk of combined cancers and lung cancer, while a “discretionary fats” pattern was associated with a decreased risk of colon and combined cancers. When dietary patterns were derived using PCA, the “Prudent” pattern was associated with a decreased risk of colon and combined cancers, and a “sugar, fruit, and dairy” pattern was associated with a reduced risk of lung cancer. The 2018 WCRF report on cancer prevention recommends eating a diet high in fruits, vegetables, whole grains, and legumes [7]. While the report specifically indicates that there is “strong evidence” that consumption of these foods is protective against colorectal cancer, it states that these foods also protect against weight gain, which is a risk factor for many cancers [7]. These recommendations are in line with our finding that a “Prudent” dietary pattern protects against combined cancers and colon cancer. These results are also consistent with previous studies [7, 21, 22, 29,30,31,32,33,34]. Contrary to previous findings, though, we did not find an increased risk of cancer among those following a “Western” pattern [21, 22, 29, 30, 35]. A possible explanation for this difference is that the “Western” pattern was common in this population, so even those in the lowest quartiles had relatively high exposure to components of this pattern, which may have decreased the ability to detect an association with cancer incidence in this population. In general, our results support cancer prevention messaging to maintain a diet high in vegetables, fruits, fish, and whole grains.

It is important to acknowledge the apparent contradiction in results obtained through PCA and RRR methods for lung cancer as highlighted in Table 4. In the PCA model, the “sugar, fruit, and dairy” pattern is associated with reduced risk, whereas the “fructose” pattern in the RRR model is associated with an increased risk. Based on the factor loading values presented in Supplementary Tables 1, 3, added tablespoons of sugar is a bigger contributor to the “fructose pattern”, whereas the natural fruit groups were stronger contributors to the “sugar, fruits, and dairy pattern”. The most recent WCRF report published in 2018 suggests that fruit intake is protective for lung cancer risk, which may partly explain these discrepancies in results. However, this interpretation of the results cannot be confirmed by our present findings because these are analyses of dietary patterns, so we cannot confirm the effects of single nutrients. This may be a worthwhile area of future research.

The protective effect of fruit intake may be due to active flavonoid compounds present in fruits, which have antioxidant properties [36]. However, other studies have found this relationship to exist only among smokers because they may benefit more from the antioxidants found in fruit [37, 38]. The WCRF report also acknowledges a protective association between fruit and vegetable intake and lung cancer, which is attributed to the presence of beta-carotene in these foods [7]. We also found a protective effect for the “dietary fiber” pattern on lung, colorectal, and prostate cancers, which is consistent with previous findings [7, 25], and may also be attributable to fruit consumption in this group.

Decreased risk of combined cancers, as well as breast and colon cancers, was associated with a higher “discretionary fat” dietary pattern. The WCRF report found a limited association between discretionary fat intake and pancreatic cancer, but there was insufficient evidence for all other cancer types [7]. One study found a moderate positive association between discretionary fat intake and breast cancer risk, but this finding was attenuated after adjustment for red meat intake [39], and a meta-analysis found no association [40]. The protective association that we observed for discretionary fat dietary patterns might have been due to social desirability. Discretionary fats are defined as excess fat that may be present in foods within the “Milk” and “Meat and Beans” categories that are high fat versions instead of low fat alternatives [41]. Both male and female non cancer cases have a higher intake of saturated fat (Table 1). However, the food group saturated/discretionary fat is prone to social desirability, particularly in individuals with an unhealthy lifestyle and individuals with overweight or obesity who are prone to underreporting discretionary fats [42]. A larger proportion of individuals with overweight and obesity are present in the case group (Table 1).

Our results show a non-statistically significant decrease in all cancer incidence with a higher intake of a “vitamin D pattern”. We used dietary vitamin D to derive the “Vitamin D dietary pattern”, as variance in intake of supplemental vitamin D cannot be explained within a dietary pattern. An Italian case control study found an inverse association between dietary vitamin D intake and colon cancer risk among participants in the highest categories of intake [43]. However dietary vitamin D intake in this population is far below the recommended daily intake of 15 μg of vitamin D in Canada (Table 1) [44].

This study’s strengths include the large sample size of the cohort which permitted a concurrent analysis of five different cancer sites. The dietary assessment method (Canadian Diet History Questionnaire) was validated for the nutrients used in this study [13]. Moreover, we used two different statistical methods in this study to derive dietary patterns to overcome some of the inherent limitations with PCA such as unexplained variation in nutrient intake. In addition, we considered the possibility of reverse causation by excluding cancer cases diagnosed in the first 2 years of follow-up. Finally, data on many different covariates were obtained at baseline and therefore could be included in the analysis.

The main limitation of this study is the potential for exposure misclassification due to the social desirability bias that is inherent to self-reported dietary intake questionnaires. We recognize that food frequency questionnaires, including the Canadian Diet History Questionnaire used in this study, are known for misreporting and measurement errors (e.g., recall bias, modifying dietary habits or responses on a questionnaire through self-reflection, or to reduce response burden) [6, 42]. Nevertheless, this approach to collecting information on nutrient intake in studies that include a large number of participants is an acceptable limitation in the field of nutritional epidemiology for the purpose of studying associations between nutrient intake and chronic disease risk. In addition, dietary intake was only assessed once, and changes in dietary intake were not taken into consideration. This sample is a homogeneous population, which impacts the generalizability of the study findings and increases the potential for unmeasured confounding. The RRR was performed with the data of the food groups and nutrients from the same questionnaire. Therefore, food groups and nutrients are not from independent sources [15] and correlated errors in intake assessment might have been present. We have chosen to not include supplementation use as part of our dietary patterns analysis to focus solely on the contributions of dietary intakes on cancer risk. An average of 34% of Canadians have reported taken vitamin D supplements [45], suggesting that total vitamin D intake (dietary + supplements) may have been underreported in approximately one-third of our sample. We suggest that dietary supplementation and cancer risk be an area of future research. Finally, while data-driven in their approach, there remains a certain degree of subjectivity in the analyses of dietary components based on statistical thresholds and modeling decisions. This subjectivity should be noted when interpreting results both within and across studies.

Conclusion

To our knowledge, this is the largest Canadian population to be analyzed for dietary patterns and cancer incidence to date. Using population-based survey data, we were able to identify commonly-consumed dietary patterns, and explore the associations of these dietary patterns with multiple cancer sites. These dietary patterns were assessed with two complimentary approaches to provide a richer perspective of population dietary patterns. This is an emerging field of interest in cancer research and, while strong protective associations were found for those consuming a “Prudent” dietary pattern or a “dietary fiber” pattern and the dietary pattern characterized by “fructose” consumption conferred an increased risk of lung and combined cancer incidence. While these findings generally support cancer prevention strategies that advocate for a diet high in vegetables, fruits, fish, and whole grains, a reduced risk of colorectal cancer was also observed for those with a discretionary fat dietary pattern. We suspect this may be attributed to the mono- and polyunsaturated fats that are part of discretionary fat intake. These findings provide insights into dietary patterns consumed in Alberta and their associations with incidence of several cancer types that can be further compared to other populations worldwide to strengthen the evidence for the role of diet in cancer prevention.