Introduction

Colorectal cancer (CRC) represents a major health and social burden around the world, being the second most common cancer in women and the third in men [1, 2]. The highest CRC incidence rates are reported in Europe, North America, and Oceania (>40 per 100,000), whilst Sub-Saharan Africa has the lowest incidence of this disease (<2 per 100,000) [3]. Developing regions that have undergone industrialization and urbanization have seen a significant increase in CRC prevalence, suggesting that the adoption of a more “Westernised“ lifestyle could be a contributing factor [2, 4, 5].

Changes in dietary habits, along with changes in related lifestyle factors such as obesity, alcohol, and tobacco consumption, have been proposed to influence the increase in CRC [5,6,7]. Diet has been shown to be one of the most important modifiable risk factors in CRC [5, 8,9,10,11,12], with numerous epidemiological studies investigating the association between specific components of diet and risk of CRC [13,14,15,16,17,18]. Although research has demonstrated that some food groups, such as red meat, have been associated with CRC risk [19], studies focusing on individual components of diet may not provide a reflection of the overall diet nor take into account the complex interactions from other dietary components. This might be one of the reasons that observational studies of single food analysis and CRC have been inconclusive [13, 20, 21].

Limitations in the study of single foods and single nutrients have motivated the use of alternative methods to facilitate a more exhaustive approach to investigating the association between diet and disease. In recent years, with the introduction of principal component analysis (PCA) as a statistical tool to examine the relation between diet and disease, observational studies have used this approach to derive dietary patterns. PCA allows the investigation of diet as a whole, by aggregating the foods or food groups that are commonly eaten together as part of an underlying dimension of food consumption, defined as a dietary pattern [21]. With the use of this approach, several associations have been observed between dietary patterns and the risk of several cancers [22]. It is possible that PCA might deal with the complexity of diet and tackle confounding issues better than single food or nutrient analysis.

The objective of this study was to systematically analyse and interpret the existing scientific evidence from observational studies published up to July 2017 that examined the association of dietary patterns derived with the use of PCA, with CRC in adults.

Methods

Search strategy

This systematic review was carried out in accordance with the PRISMA guidelines for systematic reviews [23]. An electronic search was carried out using four databases, namely Web of Science, Medline (via OVID), Embase (via OVID), and The Cochrane Library, to identify all potentially eligible papers published up to the 31st July 2017.

We used a pre-defined strategy and protocol to carry out this systematic review. Details of the search strategies used to capture relevant studies are included in Appendix 1. Briefly, the following expressions were used to search for dietary patterns: ((principal component analysis OR principal component OR PCA OR factor analysis OR factorial analysis) AND (diet OR nutr OR dietary pattern OR eating pattern OR food pattern OR Diet OR food habits OR feeding behavior)). The search strategy was piloted several times before the final search terms were used. Only studies that defined dietary patterns a posterior were considered eligible for the systematic review. Reference lists of eligible studies were scanned to identify additional relevant studies. There were no language restrictions. Papers published in a language other than English were translated with the help of a native speaker.

Inclusion criteria

After de-duplication checks, titles and abstracts of all original studies were examined and selected for inclusion if they met the following criteria: empirical papers that derived dietary patterns with the use of PCA; empirical papers with the objective of illustrating, testing, criticizing or appraising PCA compared to other methods being employed for dietary pattern; analysis papers where colorectal cancer is a primary or secondary outcome. Only studies that reported risk estimates [hazards ratios, odds ratios (ORs), and relative risks] of colorectal, colon and rectal cancer and measures of variability (SEs or 95% CIs from which these could be derived) were included. The exclusion criteria for the systematic review was defined as follows: conceptual papers on methodological issues of the dietary patterns approach using PCA including think pieces and reviews of methods papers without original data; conceptual papers comparing methods of identifying dietary patterns without original data; studies that used a priori dietary patterns such as quality index, Mediterranean diet score or healthy eating index; studies that were using data-driven methods other than PCA, articles that provided only abstracts, articles presented only to conferences. Eligible papers were fully examined for data extraction.

Data extraction

Data from eligible studies were extracted according to recommended guidelines [23]. These included: author; year; paper title; country; study design; outcome; study population; characteristics of dietary patterns, including the number of dietary patterns, label of each pattern and food groups that correlated highly with this pattern, PCA information and percentage of total variance of original food items being explained by the dietary patterns in each study (if available); statistical adjustments made in the analysis, and the main findings, including risk estimates [hazards ratios, odds ratios (ORs), and relative risks] with 95% confidence intervals and p values.

To make the results more interpretable and meaningful, dietary patterns were labeled “Western” and “Prudent”, as these tend to be frequently reported across studies. The factor loadings for each dietary pattern labeled in selected studies were also considered. Dietary patterns were labeled as “Western” when correlated highly with red and processed meat, refined grains, and high-fat dairy. Dietary patterns characterized by higher principal component loadings of fruit, vegetables, whole grains, low-fat dairy, and fish were labeled as “Prudent”.

Risk of bias (quality) assessment

The risk of bias was examined using as reference the National Institute for Clinical Excellence (NICE) methodological checklist for cohort and case–control studies. Areas of bias examined included subject selection, exposure and outcome assessment, and confounding [23]. Studies were considered at low risk of bias if most of the criteria in the checklist were addressed. For cohort studies, a level of <20% loss to follow up was accepted as representing low risk of bias from incomplete outcome data.

Statistical analysis

All studies in the meta-analysis reported dietary pattern results in terms of tertiles, quartiles or quintiles, apart from one study, Flood et al. [17], which reported the highest versus the lowest scores and colorectal cancer risk. Therefore, in the meta-analysis we compared the highest versus the lowest categories of Western and prudent dietary patterns to estimate the pooled effect estimate for colorectal cancer (CRC), colon cancer (CC), and rectal cancer (RC). Distinctions between different outcome measures of relative risks (RR), including odds ratios (OR), relative risks (RR), and hazard ratios (HR), were treated the same, assuming that CRC is a rare disease as recommended in previous meta-analyses [24,25,26,27,28]. Multivariable adjusted ORs, HRs, and RRs with 95% CIs from individual studies were weighted and combined to produce an overall RR. The random effects model by DerSimonian and Laird method [29] was used to in order to account for heterogeneity between studies. The percentage of heterogeneity between studies was quantified by the I-squared statistic (I2) and heterogeneity was tested with a chi-squared test [30]. Each study was weighted and the effect sizes and confidence intervals were displayed in forest plots, along with the pooled overall effect. Publication bias was assessed through funnel plots using the Egger test [31]. Meta-regression was performed by study design and gender. Subgroup analyses were performed to investigate associations between the dietary patterns and distal and proximal CC and CRA risk as well as for a “drinker” dietary pattern (containing wine, beer, spirits and in some occasions, other foods). All statistical analyses were conducted with STATA, version 12.0 (2012; StataCorp, College Station, TX).

The codes used to perform the analyses can be requested to the corresponding author.

Results

Inclusion

A total of 4824 papers were retrieved from the search strategy (after deduplication). After checking all titles and abstract for eligibility, 38 studies were fully examined, of which 28 met the inclusion criteria and their data were extracted and meta-analysed (Fig. 1). The summary characteristics of included and excluded studies and their methodology are described in detail as supplementary files. Amongst included studies, sixteen were cohort studies [12, 15,16,17,18, 32,33,34,35,36,37,38,39,40,41,42], ten were case–control studies [13, 43,44,45,46,47,48,49,50,51], and two were of cross-sectional design [52, 53]. The geographical provenance of the studies included North America and South America, Europe, and Asia.

Fig. 1
figure 1

PRISMA flowchart for selection of eligible papers on colorectal cancer and dietary patterns derived from principal component analysis (PCA)

Quality assessment was examined in relation to confounding, selection and assessment bias, most of which showed an overall low risk of bias (Fig. 2). All of the studies in this review were of low risk of confounding bias, as considerations were made in statistical models for the main confounding factors, including age, gender, total energy intake, physical activity, and body mass index. There was low risk of assessment in all studies because the exposure measures (dietary intake) were determined using validated FFQs or similar data collection methods, and the outcome measures (cancer) were either confirmed histologically, through medical records or cancer registries, which are all considered reliable and valid. Selection bias, from source populations, in case–control studies was generally low as cases and controls were from comparable populations and cases are clearly defined and differentiated from controls. However, it was difficult to detect in some case-controls studies the exact participation rate and if the authors had adjusted for exposure differences between study participants and non-participants. In cohort studies there were some methodological issues with confirmation of patients without cancer at follow up, which led to the majority of studies being classed as “unclear” and “high” for selection bias.

Fig. 2
figure 2

Quality assessment of included studies using the NICE Guidelines on systematic reviews

Outcomes and dietary exposures

Studies with varying outcomes of interest were included in the systematic review, including CRC, CC, RC, proximal, and distal CC. The majority of studies (76.9%) used either histology/pathology or medical records to confirm cases of CRC. Only 15.4% used cancer registries only as confirmation of the study outcome.

Table 1 summarizes the characteristics of the included studies. The majority of studies used semi-quantitative FFQs and quantitative FFQs to ascertain dietary intake (n = 23). Two studies used qualitative and quantitative diet history questionnaires [12, 51]; one ascertained dietary intake using an adaptation of the validated diet history questionnaire of the Coronary Artery Risk Development in Young Adults (CARDIA) Study [13]; one used a dietary questionnaire that comprised of quantitative and qualitative questions on food intake [18]; and one paper used a semi-quantitative food frequency instrument [48]. The number of food items enquired across dietary questionnaires ranged between 40 and 267.

Table 1 General characteristics of included studies▓

All papers reported a “Western” and a “Prudent” dietary pattern. The number of dietary patterns in each study varied between 2 and 14, with a mean of 4 dietary patterns (Table 1). In some studies, dietary patterns were labeled qualitatively corresponding to the foods or foods groups that are considered to provide some benefit to health (namely prudent). These were generally characterized by higher factor loadings of fruit, vegetables, whole grains, low-fat dairy products, poultry and fish. Other dietary patterns following similar dietary features that were also classed as prudent for this review were labeled “healthy”, “healthful”, “vegetable-fruit-soy”, “Mediterranean”, “vegetable”, “fruit-vegetable”, “fruit and vegetable”, “prudent vegetable”, “high-dairy”, “high-fruit and -vegetable, high-starch, low-alcohol (DFSA)”, “high-dairy, high-fruit-and-vegetables, low-alcohol (DFA)”. Alternatively, dietary patterns were labeled according to the extent they are positively connected with a specific lifestyle, with a “Western” dietary labels being most common. For analysis, the following dietary patterns were grouped into the Western dietary pattern, labeled “meat”, “meat-dim-sum”, “high-fat and proteins”, “western”, “Pork, processed meats and potatoes”, “red meat and potatoes”, “processed meat”, and “meat, potatoes, refined grains”, “animal food” and “Southern Cone”.

Associations of dietary patterns with CRC

The association of CRC and Western and prudent dietary patterns are presented in Fig. 3. When an overall pooled effect was estimated from a random effects model, there was a positive association between Western dietary pattern and CRC (RR 1.25; 95% CI 1.11, 1.40), whereas the prudent dietary pattern had a statistically significant negative association with CRC (RR 0.76; 95% CI 0.68, 0.86). Case–control studies showed a higher heterogeneity than cohort studies in both meta-analyses. Funnel plots showed little evidence of asymmetry and publication bias was only observed when the association between CRC and prudent dietary pattern (Fig. 4).

Fig. 3
figure 3

Meta-analysis of associations between Western and Prudent dietary patterns (highest vs. lowest categories of intake) and CRC risk, stratified by study design

Fig. 4
figure 4

Illustration of publication bias in included studies (funnel plots with Pseudo 95% confidence intervals) according to Western or Prudent (a) or Drinker pattern (b)

When combining all the effect of the studies that investigated CC as an outcome, the overall direction of association remained the same as for CRC in both Western and prudent dietary patterns (Fig. 5). There was evidence of 30% increase of risk of CC for Western dietary patterns (RR 1.30; 95% CI 1.11, 1.52), with high heterogeneity in cohort and case-control studies (I2 = 69.5% and 71.5%, respectively). A prudent dietary pattern was associated with a 19% reduced risk of CC (RR 0.81; 95% CI 0.73, 0.91), with no evidence of heterogeneity in cohort studies (I2 = 0%) and high heterogeneity in case-control studies (I2 = 78.1%).

Fig. 5
figure 5

Forest plot of the highest compared with the lowest categories of intake of the Western dietary pattern and CC risk, stratified by study design

With regards to RC, there was no evidence of a statistically significant association between this cancer and Western dietary pattern intake, with high overall heterogeneity across studies (I2 = 62.8%). A prudent dietary pattern was negatively associated with RC in the case–control studies (RR 0.58; 95% CI 0.40, 0.85) but there was no evidence of association in the cohort studies (Fig. 6).

Fig. 6
figure 6

Forest plot of the highest compared with the lowest categories of intake of the Western dietary pattern and RC risk, stratified by study design

The results of the analyses with “Drinker” dietary pattern are shown in Figure S1. There was a borderline positive association with risk of CRC (RR 1.19; 95% CI 0.99, 1.43), and no evidence of association with CC or RC. All meta-analyses had very little heterogeneity (I2 < 3.0%).

Subgroup analyses

When results were analysed by studies that investigated colon cancer sub-sites only, an increased risk of both proximal CC (RR 1.19; 95% CI 1.05, 1.35) and distal CC (RR 1.48; 95% CI 1.23, 1.79) remained, when comparing the highest to the lowest categories of intake score of Western dietary pattern. On the other hand, a prudent dietary pattern was negatively associated with risk of both proximal and distal CC (RR 0.72; 95% CI 0.60, 0.85; and RR 0.80; 95% CI 0.69, 0.93, respectively (Figure S2; supplementary file).

Further sensitivity analyses were carried out stratifying by gender and continent. For both men and women, the risk of CRC and CC with a Western dietary pattern, was statistically significantly higher, whilst the reduced risk of these cancers was also observed for both genders. The risk of RC was unrelated to either males or females when stratifying by sex (Figure S3). When we examined the risk of colon cancer by geographical provenance (Figures S4, S5 and S6) we found that a Western dietary pattern was positively associated with risk of CRC, and CC in individuals from North America and South America. Similarly, in these two regions, a prudent dietary pattern was negatively associated with the risk of both CRC and CC.

Discussion

In this systematic review, we investigated the association of CRC, CC and RC with dietary patterns identified with the use of PCA in observational studies. Both cohort and case–control studies showed that a “Western” dietary pattern was statistically significantly associated with an increased overall risk of CRC and CC. A prudent dietary pattern showed a statistically significant negative association with CRC and CC. Stratified analyses by gender showed that both men and women were at increased risk of CRC and CC if they had a “Western” dietary pattern. Proximal and distal CC were positively associated with having a Western dietary pattern, whilst the risk of having these cancer sites was reduced in individuals with a prudent dietary pattern. When stratifying by world region, the associations for CRC and CC with “Western” dietary pattern were observed in studies from North America and South America.

“Western” dietary patterns have generally been associated with increased risk of CRC and CC [13, 16, 17, 34, 35, 48, 54]. We found that a “Western” dietary pattern (mostly comprised of red and processed meats) was associated with a 25% higher risk of CRC (95% CI 1.11, 1.40). The case–control study of Carr et al. in German adults reported similar higher risks of CRC in individuals with a higher intake of red and processed meat (OR 1.66; 95% CI 1.34, 2.07) [55]. In our sensitivity analyses, this risk of CRC was stronger in women (OR 1.26). In a large population-based study, Vulcan and colleagues found that intake of pork was associated with a similar higher risk of CRC in Swedish women (HR 1.54; 95% CI 1.12, 2.15) [56].

The definitions used for “Western” dietary pattern varied between the observational studies included in this systematic, but the majority had in common meats (red and poultry), refined grains, and foods rich in sugar. This suggests that there might be a synergic effect of grouped foods on risk of CRC [57]. Refined grains and sugary products (e.g., confectionaries) are known to cause elevated plasma insulin levels and insulin-like growth factor-1, which are both associated with cancer risk, in particular CC [58]. Several epidemiological studies have confirmed a higher risk of disease in individuals who had a higher consumption of refined grains (e.g. wheat) [59], or high intake of carbohydrates [60], whilst adherence to dietary guidelines that restrict intake of sugary drinks has been associated with lower risk of CRC [61].

All the prudent dietary patterns described in this systematic review included fruits and vegetables, and most studies also included legumes. These foods are rich in dietary fiber and flavonoids, which have been suggested to protect against CRC risk [62]. The overall reduced risk we observed for a prudent dietary pattern on CRC and CC cancer (24 and 19%, respectively) is similar to the evidence from observational studies examining the association with fruit and vegetable intake. A recent systematic review on apple intake and cancer risk showed that a higher intake of this fruit was associated with a 33% lower risk of CRC in a meta-analysis of five studies [63].

Dietary fiber has been suggested to reduce the risk of CRC through several mechanism including promotion of a healthy gut microbiota [64], facilitating the elimination of faecal carcinogens, and decreasing colonic pH [65, 66]. Fruit fiber in particular, has also shown a strong association with reduced CRC risk, which may be due to its high pectin content, as well as an anti-proliferative effect on short-chain-fatty-acids [65, 67, 68]. Prudent dietary patterns are also considered less likely to contain harmful compounds found in processed foods (including processed meats), such as polycyclic aromatic hydrocarbons [46, 69].

In spite the various biological and chemical mechanisms that support a beneficial effect of fruit and vegetable intake against CRC, such association has not been confirmed in some observational studies [67, 70, 71]. A recent review on vegetable intake and CRC risk suggested that differences in findings might be due to study design, as reduced risks found in case-control studies, are not always confirmed in cohort studies [72]. The reasons for these inconsistencies between case-control and cohort study designs might be methodological, including the inherent recall bias in case-control studies or due to measurement error in cohort studies. These differences can also be due to the variety of foods that are included in fruits, vegetables and wholegrains, which might have different specific health benefits [73].

High alcohol intake has been suggested to be an important risk factor for CRC [5]. The meta-analysis on “drinker” pattern showed that it was associated with a 19% higher risk of CRC, but was not associated to CC or RC risk. Epidemiological studies in The Netherlands [61] and Malaysia [74] have shown that restricting alcohol intake, as part of a healthy diet, is associated with a reduced risk of CRC. Although the detrimental effects of ethanol are well known, it is also acknowledged moderate consumption of red wine and beer might be beneficial for their anti-inflammatory and antioxidant properties [75], which might partly explain the differences in the associations between colon sites.

Strengths of the review

This is the first systematic review and meta-analysis exploring the association between dietary patterns derived from PCA and CRC risk. The review was done following the PRISMA guidelines [23], and four datasets were searched to identify eligible papers. Every effort was made to include specific country-related dietary habits, therefore no language restrictions were used in the search strategy. The systematic review also included an itemized assessment of the main biases that can be found in epidemiological studies, and used a visual, as well as an objective measurement of publication bias. We examined evidence that has analysed dietary intake using a relatively novel statistical tool in nutritional epidemiology, PCA, which is argued to capture the complexity of the diet of the population [13, 21]. Dietary patterns derived using PCA claims to capture the interactive and additive effects of diet on disease outcomes [13]. Also, meta-regression by study design and gender provided additional insight to the association between dietary patterns and CRC, CC, and RC.

Another strength of this systematic review is that we also examined the association of dietary patterns and CRC by geographical region (continents). The detailed description of dietary patterns in this systematic review confirms that dietary patterns derived from PCA are latent dimensions, which are comprised of different food items across cultures, and they could represent different food items in different contexts. The definitions used by the studies not always represent the traditional understanding of a “ Westernised” dietary pattern. This might partly explain the very high heterogeneity across studies in Asia, and in some cases, the lack of association with disease outcomes; for example, dietary patterns in Japanese studies reported that “animal food pattern”, containing red meat, but also soy and fish, which are considered “healthy” foods.

We acknowledge some limitations inherent to this systematic review. The analyses used pre-defined dietary patterns and assumed methodological homogeneity in these definitions across studies. In spite of this, there was a relative consistency in the type of foods included under “Western” or “Prudent” patterns, but such pre-definitions prevented us from further exploring the role of specific foods, such as meat. Definitions of components of specific dietary patterns can differ greatly between studies and populations, but foods associated with a high risk of non-communicable diseases tend to naturally correlate between one another. The results of the meta-analyses were fairly consistent in showing that regardless of the components defining a Western dietary pattern, it was related to a higher risk of CRC. The methods to collect the dietary intake that was used to derive dietary patterns also differed between studies. Although food frequency questionnaires were the most commonly used tool, they differed in the number of items considered for analysis and the time period the questionnaire collected data, for example one, six or twelve months previous to completing the questionnaire. These may be problematic when combining data to provide on overall effect estimate, as dietary patterns may be manipulated by differing habits across geographical populations.

We found evidence of heterogeneity shown across studies, regardless of the outcomes studied. High percentage of heterogeneity in the meta-analyses can be explained by a number of factors, including differences in the sample population for each study such as age and sex, whether these samples represented the general population and the variation in severity of disease. Different adjustments in the models might also have a role in the lack of associations observed with some of the specific cancer sites studies. Although all studies took in account confounding variables, some used a more thorough approach to control for potential confounders, including family history of CRC and having first-degree relatives with CRC. Epidemiological studies are also limited by the possibility of unknown confounders blurring associations.

PCA is a popular statistical tool to explore associations between several foods and risk of diseases. In spite of its increasing use, PCAs are not exempt from limitations. During the PCA process, there are a number of assumptions and subjective decisions made to drive dietary patterns. These decisions may impact the type and number of dietary patterns. Furthermore, the criteria to derive loading factors will also influence the dietary components of patterns. It is recommended that a factor loading of 0.3 is used to derive a dietary pattern. However, the cut-off point varied greatly between studies to allocate foods into specific patterns. Recent evidence suggests that other models than PCA can be used to derive dietary patterns and might provide a more accurate reflection on the association between diet and disease [76].

In conclusion, our findings provide evidence of a positive association between “Western” and “drinker” dietary patterns and colorectal cancer, particularly in countries from North- and South America, and a reduced risk of CRC with a prudent dietary patterns. The results of these meta-analyses lend support to the notion that well-designed lifestyle and dietary interventions could contribute to reduce the risk of CRC in the general population.