Introduction

Information about risk factors for colon cancer is valuable for several reasons: (1) some patients may alter behaviors to reduce risk; (2) patients at high risk of colon cancer may be screened more often; and (3) identifying risk factors that cannot be explained by known mechanisms for colon cancer might stimulate research on etiology.

Colon cancer risk factors have been studied often [1, 2]. The present study has the advantage of simultaneously considering many risk factors in a large data set. This makes it possible to assess the relative importance of known risk factors, find new risk factors, and assess which risk factors are secondary (their association is removed by adjusting for other risk factors). The data for the present study were collected by the Women’s Health Initiative (WHI). Because the WHI is a prospective study with a large, comprehensive, meticulously collected database on an important subpopulation, it is especially well suited for evaluating risk factors.

Methods

The WHI study design has been described in detail [36]. In brief, it was a long-term national health study that focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporosis in postmenopausal women. Women between the ages of 50 and 79 were enrolled for an observational study or randomized controlled trial (RCT) from 1993 to 1998 at 40 clinical centers throughout the United States. All participants signed informed consent forms. The institutional review boards at all participating institutions, including the coordinating center, subcontractors, and clinical centers, approved the study protocols and procedures. The three therapies tested in RCTs were estrogen therapy for women without a uterus, estrogen plus progesterone for women with a uterus, and a low-fat diet. Women ineligible for, or not interested in, any of the RCTs were given the opportunity to enroll in the observational study, which was intended to provide new risk factor information on major causes of morbidity and mortality among postmenopausal women. The median follow-up time for all women was 8 years.

Participants available for analysis included 161,748 WHI participants: 93,651 from the observational study, 16,590 from the RCT of estrogen plus progesterone (E + P), 10,722 from the RCT of estrogen only (estrogen alone), and 40,785 additional women who were in the diet study. (Women in RCTs for both hormone therapy and diet therapy were only included in the above counts for hormone therapy.) Of the 161,807 participants in the data set, 9,554 were excluded (n = 152,253) based on criteria that the WHI used to select patients for the RCT studies. An additional 1,341 participants were excluded from all analyses examining risk factors independently associated with colon cancer. These participants were missing information on follow-up dates (550) or waist (794), which was one of the risk factors most strongly associated with colon cancer. Values of other missing variables were imputed by using the mean for an ordinal variable or the most common category for a categorical variable. There were 150,912 women in the analyses of risk factors.

Data

For follow-up and outcome ascertainment, all subjects or their proxies completed a self-administered self-report that asked about all hospitalizations. This report was completed semiannually by the RCT participants and annually by the observational study participants. Medical records were requested from the hospital or outpatient facility, and death certificates were obtained from the state or the family. All records were reviewed by the local adjudicator. The outcome variable used in this study was colon cancer, excluding participants with a history of colon cancer.

There were 869 factors that could be evaluated for an association with the risk of colon cancer for all participants and an additional 102 factors that could be evaluated for an association with participants in the observational study. All risk factors were obtained at baseline. The vast majority were self-reported or created from self-reported information, but physical and clinical measurements (e.g., waist and hematocrit levels) were taken by study personnel.

Types of factors evaluated included demographic, general health, clinical and anthropometric, functional status, healthcare behaviors, reproductive, medical history, family history, personal habits, thoughts and feelings, and therapeutic class of medication.

Statistical methods

All analyses were performed using the Cox proportional hazard regression model to adjust for some factors while evaluating others. To identify potential risk factors, we first tested the statistical significance of each variable after adjusting for age and the study that recruited the patient. All variables that were statistically significant at the p < 0.05 level were then included in a backward stepwise Cox proportional hazard regression analysis, and variables that were not statistically significant at the p < 0.001 level were removed from the model. We then tested all factors in the data set to determine whether any that were not in the model were statistically significant at the p < 0.001 level after adjusting for factors in the model.

The proportionality assumption in the Cox model was tested in two ways. One test examined whether the hazard ratios of the other risk factors were modified by the three variables that were most strongly associated with colon cancer: age, waist, or hormone therapy. A second test examined whether the linear form of the variable was adequate. This tested whether the categorical version of the variable represented by the set of indicator variables for all but the reference categories remained statistically significant after adjusting for the linear form of the variable.

The large sample size and the large number of variables tested made it more likely that variables with a weak or spurious association with colon cancer would be included in the Cox model. Using p < 0.001 for inclusion protected somewhat against including these variables. However, we did not adjust the significance level of the variables for multiple comparisons. This approach is supported by leading epidemiologists who believe that adjusting for multiple comparisons can obscure meaningful results [7].

In addition to testing the variables in the complete data set, we also tested the results for participants in each of three subsets: (1) the observational study, (2) the RCT of diet, and (3) the combination of the two relatively small RCTs of hormone therapy. Associations that are statistically significant in more than one data set are more likely to be generalizable and less likely due to chance.

We presented the hazard ratios and χ2 values for factors that were considered in the literature to be of potential interest or that were associated with colon cancer at the p < 0.001 level. For ordinal variables, the hazard ratio was for an increase in the variable of one standard deviation. We compared χ2 values instead of p values because p < 0.0001 for many associations studied and comparison of p values smaller than this value is not meaningful.

To test whether the risks associated with some variables were influenced by others, we used statistical tests for interaction.

Statistical analyses were performed using SAS version 9 (SAS Institute Inc, Cary, NC, USA).

Results

In the data set analyzed, there were 1,207 patients who had colon cancer and 282 who had rectal cancer only.

The demographic characteristics of the participants are shown in Table 1. Most were white, between the ages of 55 and 70, and with more than a high school education. The regions of the United States were equally represented. More than half of the participants were from the observational study.

Table 1 Demographic characteristics

Independently significant factors

As shown in Table 2, 11 variables were found to have an independently significant association with colon cancer at p < 0.001 level. None of the subjects analyzed for this table were missing data on age or waist. Values were imputed for less than 1 % of the values of six variables, 3 % of the values for years smoking, 9 % of the values for relatives with colon cancer, and 2 % of the responses to cholecystectomy (2 %). After eliminating the 13,152 subjects who did not respond to the question about relatives with colon cancer, the adjusted hazard ratio for this variable was 1.28 (χ2 = 11.5) as compared to a hazard ratio of 1.31 (χ2 = 13.6) when subjects with estimated values were included.

Table 2 Factors independently associated with colon cancer at the p < 0.001 level in the full data set

The χ2 value was much greater for age, χ2 = 243, than for other variables. Waist had a higher χ2 value than the other available adiposity measures (weight, body mass index, and waist–hip ratio), and after adjusting for waist measure, these other measures were not statistically significant at the p < 0.001 level. The hazard ratio associated with waist was much higher in the observational study than in the other data sets, but the variation among the hazard ratios was not statistically significant. Taking hormone therapy at baseline was associated with a reduced risk of colon cancer for each of the 3 data sets. Variables for the duration or type of hormone therapy (estrogen only or estrogen plus progesterone) were not significant at the p < 0.10 level. Colon cancer risk was greater for those participants who smoked more and lower for those with arthritis. The highest hazard ratio was associated with diabetes.

We tested 27 interactions to evaluate whether the association of colon cancer with any of the 11 factors in Table 2 was modulated by the three most statistically significant variables: age, waist, or hormone use. The only interaction that was statistically significant at the p < 0.05 level was for waist and diabetes: waist had a weaker association with colon cancer for participants with diabetes. In addition to modulating the association between waist and colon cancer, diabetes also provided some of the same information about the risk of colon cancer as waist, that is, diabetes and waist had overlapping components [8]. After removing diabetes from the equation the association between the χ2 value for the association between waist and colon cancer had a χ2 value of 45.6 instead of 36.0 in the equation which included diabetes and after removing waist diabetes had a χ2 value of 22 instead of 11 in the equation which included waist.

Another test of the proportionality assumption assessed whether a set of nonlinear terms for a variable was statistically significant. No set of nonlinear terms were statistically significant at the p < 0.001 level. The lowest p values were for the 2 nonlinear terms for age (χ2 value = 9.0, p = 0.01) and for the 2 nonlinear terms for waist (χ2 value = 8.6, p = 0.01).

The variables in the Cox proportional hazard regression equation in Table 2 were run again with rectal cancer as the outcome variable, and all patients who had colon but not rectal cancer were removed from the analysis. The only risk factors that were independently significant in this analysis at the p < 0.05 level were a 10-year increase in age (hazard ratio = 1.39, p = 0.0002), hormone therapy (hazard ratio = 0.57, p < 0.0001), and a 10-cm increase in waist (hazard ratio = 1.14, p = 0.004). All other variables with the exception of diabetes and relatives with colon cancer had hazard ratios substantially closer to 1 than they did when colon cancer was the outcome.

Other factors sometimes considered to be associated with colon cancer risk are evaluated in Table 3. Both demographic factors (race and income) had an age-adjusted association with colon cancer at the p < 0.001, but became much less significant after adjusting for other variables. Pills for hypertension were associated with an increased risk at p = 0.007, previous colonoscopy is associated with reduced risk at p = 0.009, and physical activity is associated with an adjusted reduced risk at p = 0.07. We did not find that the association between physical activity and exercise was stronger for women who were not taking hormone therapy. After eliminating subjects with imputed values, the age-adjusted hazard ratio for one standard deviation of our physical activity measure changed from 0.90 (p = 0.001) to 0.93 (p = 0.02), and the more completely adjusted hazard ratio changed from 0.95 (p = 0.07) to 0.99 (p = 0.85).

Table 3 Other factors previously found associated with colon cancer

Numerous variables were tested for an association with colon cancer but not tabulated because the association was not statistically significant after adjusting for age alone. Among these variables were age at menopause and age at first birth.

Discussion

There were 11 factors independently associated with colon cancer at the p < 0.001 level of statistical significance. In decreasing order of χ2 values, these were age, waist girth, use of hormone therapy at baseline (protective), years smoked, arthritis (protective presumably because of medications used for treatment), relatives with colorectal cancer, higher hematocrit levels (protective), fatigue, diabetes, greater use of sleep medication (protective), and cholecystectomy. The study does not support previous findings that the association between hormone therapy and colon cancer was modified by waist [9] or vitamin D [10].

Of the factors independently associated with colon cancer, three were significantly associated with an increased risk of rectal cancer: greater age (p = 0.0002), greater waist (p = 0.004), and not taking hormone therapy (p < 0.0001).

Epidemiological literature

Numerous studies have evaluated risk factors for colon cancer, and literature reviews have concluded that the preponderance of evidence supports an association between colon cancer and the following risk factors: age, use of aspirin and nonsteroidal anti-inflammatory drugs (NSAIDs), hormone therapy, smoking (in some but not all studies), relatives with colorectal cancer, obesity (although not in postmenopausal women), and diabetes [1, 2, 11, 12]. Our study supported all of these as important in postmenopausal women although the evidence for NSAIDs was indirect (the presence of arthritis) and instead of the usual measure of obesity (body mass index), we found in the observational study part of the WHI data that waist was a stronger risk factor. Waist was also a stronger risk factor in a previous study [13].

Risk factors for colorectal cancer have also been analyzed using data from the WHI. Some of these found the same results that we did [1416] although others did not [10, 1719]. Differences include which WHI data sets were used and how outcomes were defined. For example, one study that analyzed much of the same data as in this study did not identify hormone therapy as a risk factor [18]. That study, however, censored follow-up times 6 months after a change from baseline in hormone therapy status. The present study would have been better able to detect risk reduction a considerable time after hormone therapy use. The disadvantage of the present study is that hormone therapy may be confounded with other factors. A study that supported our results found that both estrogen alone and estrogen plus progestin protect against colon cancer [20].

Our finding of an association between cholecystectomy and colon cancer does not agree with two very different meta-analyses, which did not find a substantial association [21, 22]. However, our results do support other studies [23, 24], including one that found the association increased with greater time after cholecystectomy [25]. It is likely that the time after cholecystectomy was substantial in the present study because women were between 50 and 79 years at baseline and followed up for the development of colon cancer for an average of 8 years.

One possible explanation for this association is that cholecystectomy results in changes in the composition and secretion of the bile acid pool, and these changes may promote the exposure of colonic mucosa to the carcinogenic secondary bile acids [26, 27]. Another possible explanation is that cholecystectomy is a marker for other risk factors that are more fundamental such as a history of previous gallstones [28] and obesity.

It is generally accepted that the risk of colon cancer is reduced by exercise [1, 2, 11, 12] and colonoscopy [29, 30]. Both of these factors were statistically significant in our analysis at the p = 0.001 level adjusting for age alone, but the impact of colonoscopy was not strong (hazard ratio = 0.90) and the impact of exercise was greatly reduced after adjusting for other risk factors (e.g., waist girth and smoking), which may have mediated the association between exercise and colon cancer.

This study also evaluated factors that have been investigated infrequently. Studies have found that the risk of colon cancer in postmenopausal women increases with height [31], earlier age at first birth [32], later age at first birth [33, 34], and later age at menopause [34]. Other studies found that that risk was unrelated to age at first birth [35, 36] or age at menopause [36]. Statin use has also been found to reduce the risk of colon cancer in some studies [37, 38], but not in others [39, 40]. Our results did not support any of these associations.

Factors underlying the associations

Some of the results from the epidemiological studies have been partially explained. Adipocytes, particularly in the viscera, alter the metabolism of endogenous hormones, including insulin, insulin-like growth factors (IGFs), sex steroids, and possibly adipocyte-derived factors such as leptin and adiponectin [41]. The role of insulin in the association between waist girth and colon cancer is supported by two findings. One is that waist girth and diabetes provide some of the same information about the risk of colon cancer, possibly increases in glucose or insulin levels. A second is that waist girth is a significantly stronger risk factor for subjects capable of producing more insulin in response to greater insulin resistance caused by greater waist girth (i.e., those who do not have diabetes.)

A possible mechanism for the protective effect of hormone therapy is that estrogen blocks the stimulatory effects of insulin [42]. If this hypothesis is true, however, we would expect that the protective effect for hormone therapy should be stronger for women with higher levels of insulin (e.g., those with diabetes or larger waists), which has been found previously [9] but was not true in our data set or another [43]. A possible mechanism for the risk associated with smoking is that some of the cancer-causing substances in the smoke are swallowed or reach the colon through the bloodstream [2]. NSAIDs seem to reduce colon cancer by blocking the accumulation of β-catenin [44].

Lower hematocrit levels may be a risk factor for colon cancer because it reflects occult bleeding in an undetected cancer or precursor lesion.

Advantages and limitations

The advantages of WHI are a large, diverse study population with long follow-up, comprehensive participant information, and careful data collection. The limitations are that some risk factors are difficult to measure precisely, change over the time, are confounded with other possible risk factors, and/or are modified by other risk factors. In addition, risk factors may have been spuriously identified because so many were tested. Many more epidemiological, controlled, and laboratory studies will be necessary to understand the reasons why certain factors have been found to be associated with colon cancer risk.

Conclusions

Although some risk factors identified here such as age, hormone therapy, and family history are well established, other findings such as the relative importance of waist girth, exercise as primarily a risk factor secondary to obesity and smoking, and the risk associated with cholecystectomy have little previous support. We are not aware of studies finding lower hematocrit levels, fatigue, and less use of sleeping pills as risk factors. Additional studies are needed to confirm the new risk factors. Our study did not provide support for previous findings that the effect of hormone therapy was modified by waist girth or vitamin D.