Introduction

It is well established that inhalation of cigarette smoke substantially contributes to cardiopulmonary disease and the development of lung cancer [1]. In addition to lung cancer, evidence suggests that cigarette smoke likely contributes to the development of oral and oropharyngeal, esophageal, stomach, colorectal, liver, pancreatic, laryngeal, cervical, bladder, kidney cancer, as well as leukemia [1]. Although the effect estimates are much smaller, evidence suggests that cancer may also be associated with exposures to combustion-related fine particulate matter air pollution (PM2.5, particles ≤ 2.5 µm in aerodynamic diameter).

There is evidence that PM2.5 air pollution contributes to chronic systemic inflammation [2], oxidative stress [3], and DNA damage [4]. Multiple studies have demonstrated that PM2.5 exposures are associated with cardiopulmonary disease mortality [5] as well as lung cancer incidence [6, 7] and mortality [6, 8, 9]. There is limited evidence that PM2.5 exposures are associated with incidence of various non-lung cancers including oral and oropharyngeal [10, 11], esophageal [11], stomach [11], liver [12], laryngeal [11], breast [13], bladder [14], kidney cancer [15], and leukemia [16]. PM2.5 exposures are also associated with mortality from stomach [17, 18], colorectal [17,18,19], liver [20], pancreatic [21], breast [18, 22], female organ [18], bladder [19, 23], kidney cancer [19], and leukemia [16]. Unfortunately, these studies are limited in scope and number and not fully consistent in their findings.

The primary objective of the current study is to evaluate PM2.5-mortality associations with specific cancer types among a large, nationally representative cohort of adults residing in the USA. Secondary objectives of this study are to use the same cohort and statistical models to evaluate associations between cigarette smoking and mortality for specific cancer types and to compare these associations with those observed with PM2.5 air pollution.

Methods

Study subjects

Public National Health Interview Survey (NHIS) and National Death Index data were used to construct a cohort of individuals aged 18–84 at the time of survey, living in the continental US, and completed the NHIS survey between 1987 and 2014 as documented elsewhere [5]. Participants represented the civilian non-institutionalized US adult population. Participant responses were linked to the National Death Index for mortality follow-up through 2015. In addition, restricted-use geographic data allowed for the assignment of ambient pollution estimates at the census tract level. Analyses were performed on two cohorts. The first cohort consisted of the 635,539 individuals (age range 18–84 years, mean age 45.3) and the second was a subset of this group of 341,665 participants who self-reported as never-smokers (age range 18–84, mean age 43.4). Both cohorts contained information on age, sex, race-ethnicity (Non-Hispanic white, Hispanic, Non-Hispanic black, or other), income buckets ($0–35,000, $35,000–50,000, $50,000–75,000, or over $75,000), marital status (married, divorced, separated, never married, or widowed), educational attainment (less than high school grad, high school grad, some college, college grad, more than college grad), BMI, smoking status (self-identified as current, former, or never-smoker), census tract, ambient pollution exposure, interview date, date of death, and underlying cause of death (if deceased). Further information about the composition of the cohorts, including details regarding the merging and harmonization of key variables, is provided elsewhere [5]. Procedures for informed consent, data collection, and linkage of the NHIS files were approved by the NCHS Ethics Review Board. Findings and conclusions of this research are those of the authors and do not necessarily represent the views of the RDC, the NCHS, the Environmental Protection Agency, or the Centers for Disease Control and Prevention.

Pollution concentration

In the baseline analysis, each study subject was assigned pollution exposure based on estimated modeled population-weighted average concentrations of PM2.5 at their resident census tract, averaged across the 17-year period from 1999 through 2015. Individuals surveyed from 1987 to 2010 were linked to census tract-level estimates of PM2.5 using census tracts of the year 2000, while individuals surveyed from 2011 to 2014 were linked using census tracts of the year 2010. Because many individuals were surveyed before 1999, and in order to explore an alternative longer window of pollution exposure, average PM2.5 for a 28-year exposure period from 1988 to 2015 was estimated using back casted PM2.5 estimates from 1988 to 1998. Back casted estimates for PM2.5 from 1988 to 1998 relied on the fact that PM2.5 is a primary component of PM10 and PM2.5 and PM10 concentrations were highly correlated during the period when they were co-monitored and before there was a more focused effort to reduce PM2.5 (approximately 1999–2003). Specifically, the back cased estimates were computed by calculating mean PM2.5/PM10 ratios for 1999–2003 for each census tract and then multiplying the PM10 estimate for each census tract from 1988–1998 as documented in detail elsewhere [5]. Estimated average PM2.5 concentrations for the primary 17-year period (1999–2015) were highly correlated (r > 0.95) with the estimated PM2.5 concentrations for the 28-year (1988–2015) period that included the additional back casted data. Documentation of air pollution estimates utilized in this study is located elsewhere [24]. The modeled air pollution data are publicly accessible at the Center for Air, Climate, & Energy Solutions website (https://www.caces.us/).

Statistical methods

Hazards ratios and 95% confidence intervals for cancer mortality risk associated with a 10 µg/m3 increase in PM2.5 concentrations were estimated using Cox Proportional Hazards (CPH) models that accounted for the complex, stratified, multistage NHIS sample design [25]. Estimates were computed using the SURVEYPHREG procedure in SAS version 9.3 (SAS Institute, Cary, North Carolina). Models were estimated for specific causes of death. Survival times were calculated using the date of interview as the beginning of follow-up. For those who died of the specific cause of death under analysis, the end of follow-up was date of death for that specific cause of death. For those who died of any other cause of death, censored end of follow-up was date of death. For survivors, the censored end of follow-up was the end of mortality follow-up (31 December 2015). All models were adjusted for age-sex-race interactions (using indicators for 5-year age buckets) and categorical variables for BMI, income, education, marital status, rural versus urban, region, and survey year. In the full cohort, models were also adjusted for smoking status. Hazards ratios and 95% confidence intervals for cancer-type-specific mortality risk associated with smoking status were also estimated. To account for multiple testing, adaptive Holms adjusted p values [26] were calculated.

The specific cancer types analyzed in this analysis are based on ICD-10 Underlying Cause of Death (including recodes for 1979–2015) as documented elsewhere [27]. Specific causes of cancer mortality included ICD-10 codes for lung (C33–C34), oral and oropharyngeal (C00–C14), esophageal (C15), stomach (C16), colorectal (C18–C21), liver (C22), pancreatic (C25), laryngeal (C32), melanoma (C43), breast (C50), cervical (C53), ovarian (C54–C55), uterine (C56), prostate (C61), kidney (C64–C65), bladder (C67), and brain cancer (C70–C72) as well as Hodgkin lymphoma (C81), non-Hodgkin lymphoma (NHL) (C82–C85), leukemia (C91–C95), multiple myeloma (C88, C90), and other unspecified cancers (C17, C23–24, C26–C49, C51–52, C57–60, C62–63, C66, C68–C69, C73–C80, C97).

To explore model sensitivity, the results from the original complex CPH model, as described above, were compared with results from alternative models using the traditional Cox Proportional Hazards model (the PHREG procedure). Alternative models included: Model 1 that included mean PM2.5 concentrations for the 17-year period (1999–2015) and controlled for covariates as in the original model (education, income, marital status, BMI, smoking status, urban/rural, census region, and survey year), but controlled for combinations of 1-year age groups, sex and race-ethnicity by allowing them to have their own baseline hazard (using the STRATA statement in the SAS PHREG procedure). Model 2 is the same as Model 1 but includes only individual controls and excludes survey year, census region, and urban/rural variables. Model 3 is the same as Model 1 but only controlled for age, sex, and race-ethnicity (using the STRATA statement). Model 4 is the same as Model 1, except it used the longer exposure window for PM2.5 (average exposure from 1988 to 2015 instead of 1999 to 2015, using back casted estimates for 1988–1998). Model 5 is the same as Model 1, but only included individuals surveyed during or after 1999. Model 6 is the same as Model 1 except it used an expanded cohort of all 1,599,329 NHIS participants from 1986 to 2014 (including individuals without smoking status or BMI data) and did not control for smoking status or BMI.

Results

Table 1 presents summary statistics for both the full and never-smokers’ cohort groups. Individual mean estimated ambient PM2.5 exposure was 10.7 µg/m3 (standard deviation 2.4) in both the full cohort and never-smokers’ cohort. The table also contains the average estimated PM2.5 exposure for the levels of the selected variables. Individual mean exposure is relatively consistent across varying factor levels aside from race/ethnicity (greater in non-Hispanic Blacks), urban versus rural (greater in urban areas), and census region (greater in the Midwest).

Table 1 Summary of baseline characteristics in the full and never-smoker’s cohort for individuals aged 18–84 who completed the US National Health Interview Survey between 1987 and 2014

Table 2 provides cancer-type-specific mortality hazard ratios (HRs) and 95% confidence intervals (CIs) associated with 10 µg/m3 increased PM2.5 exposure in both the full and never-smokers’ cohorts. Statistically significant associations were observed in the full cohort for lung, stomach, colorectal, breast, cervical, and bladder cancer, as well as Hodgkin’s lymphoma, NHL, and leukemia. However, after adjusting for multiple comparisons, these associations were not statistically significant. In the never-smokers’ cohort, statistically significant associations between PM2.5 and mortality were found for lung, stomach, liver, breast, and cervical cancers as well as Hodgkin lymphoma. Only lung cancer was statistically significant after adjusting for multiple comparisons. Table 3 shows sensitivity analysis performed on the full cohort for lung, stomach, colorectal, liver, cervical, breast, and bladder cancers as well as Hodgkin’s lymphoma, NHL, and leukemia. The PM2.5-mortality associations across the different cancer types were reasonably insensitive to various modeling choices, different exposure windows, and using the expanded NHIS cohort.

Table 2 Estimated hazard ratios (95% CIs) associated with 10 µg/m3 increase of PM2.5 adjusted for age, sex, race/ethnicity, income, education, marital status, BMI, smoking (for the full cohort), urban/rural, census regions, and survey year
Table 3 Model sensitivity was performed by comparing the results from the original complex CPH model to several alternative models using the traditional Cox Proportional Hazards model (the PHREG procedure)

Table 4 provides HRs and 95% CIs associated with identifying a patient as a current smoker or former smoker and cancer-type-specific mortality in the full cohort. Statistically significant smoking-cancer mortality HRs for current smokers were found for lung, oral and oropharyngeal, esophageal, stomach, colorectal, liver, pancreatic, cervical, prostate, kidney, bladder, laryngeal, brain, and unspecified cancers as well as leukemia. For former smokers, statistically significant associations were found for lung, oral and oropharyngeal, esophageal, colorectal, liver, breast, bladder, laryngeal, and unspecified cancers as well as NHL and leukemia. After adjustment for multiple comparison, statistically significant associations were found for lung, oral and oropharyngeal, esophageal, stomach, colorectal, liver, pancreatic, cervical, bladder, laryngeal, and unspecified cancers in current smokers and lung, oral and oropharyngeal, esophageal, colorectal, liver, bladder, laryngeal, and unspecified cancers in former smokers.

Table 4 Estimated hazard ratios (95% CIs) associated with current or former smoker in comparison to never-smoker

Discussion

Consistent with a growing body of literature, this study provides evidence that cancer mortality is associated with PM2.5 exposure in both smokers and never-smokers. Analysis of the full cohort resulted in a hazard ratio of 1.15 (95% confidence interval of 1.08–1.22), which was comparable to that of the never-smokers’ cohort (HR 1.19, 95% CI 1.06–1.33). The result was comparable to a cohort that used 18.9 million Medicare beneficiaries. The estimated HR per 10 µg/m3 increase of PM2.5 was 1.11 (95% CI 1.09–1.12) [28]. Analysis of the full cohort for non-lung cancers resulted in a hazard ratio of 1.15 (95% CI 1.07–1.24) which was also comparable to the cohort of never-smokers’ (HR 1.15, 95% CI 1.02–1.30). The results for the association between a 10 µg/m3 increase of PM2.5 and non-lung cancer are much larger than other cohort studies like the Harvard Six Cities Study (HR 1.05, 95% CI 0.87–1.27) [29] or the ACS study (HR 1.05, 95% CI 1.00–1.12) [29], but not statistically different.

This study provides further evidence that lung cancer is associated with PM2.5, especially in never-smokers. The study found a hazard ratio of 1.13 (95% CI 1.00–1.26) in the full cohort and a HR of 1.73 (95% CI 1.20–2.49) in the never-smokers’ cohort, which was significant even after multiple comparison adjustment. The PM2.5-lung cancer mortality HR was higher in the never-smokers’ cohort than in the full cohort. It is unknown whether or not the larger HR for never-smokers is due to different susceptibility, underlying biology, or simply due to differences in baseline risk. Given the large effect of smoking on lung cancer, the underlying or baseline mortality risk for lung cancer in never-smokers is much smaller than for smokers. As such the proportional hazard (an estimate of relative risk) associated with PM2.5 exposure would likely be larger in never-smokers than in smokers. The results from this study are comparable to a recent meta-analysis of cohorts examining PM2.5-lung cancer mortality (HR 1.13, 95% CI 1.07–1.20) [30].

The association between PM2.5 and mortality due to non-lung cancers is less clear. Although several cancer types (stomach, colorectal, liver, breast, cervical, and bladder cancers and Hodgkin’s lymphoma, NHL, and leukemia) were statistically significantly associated with PM2.5 exposure, none were statistically significant after adjusting for multiple comparisons. However, despite the conservative p value adjustment, stomach, liver, and breast cancer had a Holm’s p-value of less than 0.1, suggestive of an association with PM2.5. Furthermore, the statistically significant association between non-lung cancers in aggregate and PM2.5 in both the full and never-smokers cohort provides further suggestive evidence that some non-lung cancers are associated with PM2.5.

Other studies have reported PM2.5-mortaltiy associations with stomach [18, 19, 31], colorectal [18, 19, 31], liver [18,19,20, 31], breast [18, 19, 21, 32, 33], cervical [18, 19], and bladder cancer [18, 19]. Additional studies have also reported PM2.5-incidence associations for stomach [11], liver [12, 34, 35], breast [13, 36,37,38,39,40], and bladder [14, 41]. Comparisons of the estimated hazard ratios, risk ratios, incident rate ratios, and odds ratios (with their associated confidence intervals) for these cancers are succinctly illustrated in Fig. 1. Although there is substantial heterogeneity across study estimates, the results of this study provide additional evidence to the growing body of literature that PM2.5 exposure is associated with cancer mortality or incidence for lung and some non-lung cancers.

Fig. 1
figure 1

Illustration of the comparison between the results of the current study and other similar studies that estimated the association between a 10 µg/m3 increase of PM2.5 and various cancer types incidence or mortality. Studies that examined cancer incidence are marked with a triangle, whereas those that examined cancer mortality are marked with a circle

The results are also consistent with existing literature on the relationship between smoking and cancer [1], with statistically significant associations after multiple testing adjustment for lung, oral and oropharyngeal, esophageal, stomach, colorectal, liver, pancreatic, laryngeal, cervical, kidney, bladder, and unspecified cancers. Except for Hodgkin’s and non-Hodgkin’s lymphoma, cancer types that were statistically associated with PM2.5 in either cohort were also associated with smoking status. This study also provides moderate evidence for the formal establishment of prostate, breast, and unspecified cancers as caused by smoking [42]. Cigarette smoking and PM2.5 exposure may both be risk factors for various non-lung cancers, with cigarette smoking having a larger impact. Further research is needed to determine the relationship between PM2.5, smoking, and cancer type mortality.

A limitation of this study is the inability to directly measure exposure to ambient air pollution over a lifetime. With extensive follow-up and advanced ground-based monitoring and related modeling, this study used direct exposure estimates from 1999 to 2015. However, it does not directly account for exposure before this period. Although the results using back casted estimates of PM2.5 exposure and only including individuals surveyed after 1999 are similar to the original model, the estimates of the HR may still be biased. Furthermore, census tract-level estimates of PM2.5 do not account for the full range of spatial variability at residential address. Due to subject mobility, however, it remains unclear what is the optimal level of spatial averaging. Another limitation is the inability to control for migration. The migration problem is further exacerbated by the long latency period of some cancer types. In future studies, cancer incidence data could be used to reduce the latency concern. Additionally, this study did not control for other pollutants such as NO2, SO2, or CO. Other studies have found associations between pollutants other than PM2.5 and incidence and mortality from various cancers [43, 44]. Future studies should control for these pollutants.

Another limitation of the study is the potential of residual confounding. The study was unable to control for several important variables such as secondhand smoke, HPV status, occupational exposure, reproductive factors (such as hormonal therapy, oral contraceptive use, or menopausal status), alcohol consumption, dietary patterns, and genetic variables that are associated with some cancer types. However, most cancer types were not sensitive to individual risk factors such as age, sex, race-ethnicity, education, income, BMI, geographic variables, and survey years, which suggests negligible risk of residual confounding. Furthermore, average air pollution was generally consistent across the factor levels for the individual risk factors, which suggests air pollution is unlikely to be correlated with any omitted variables.

A final limitation is the lack of follow-up and quantitative measurements in the smoking data. The lack of follow-up would likely bias the estimates for smoking downwards because the number of smokers is decreasing in America. Future studies should also include quantitative measurements for smoking such as packs per day or number of years smoking. Although these weaknesses may call the results of the smoking analysis into question, many of the cancer types that were associated with current smokers are also associated with former smokers, so the lack of follow-up and number of years smoked is less concerning. Furthermore, PM2.5-cancer associations were similar in the never-smokers’ cohort, which suggests little risk of bias.

This study has several important strengths. First, the study uses a cohort that is a representative sample of US adults with high quality survey information. Second, the cohort is large and contains many deaths for most cancer types. Third, the analysis can control for individual risk factors for cancer such as smoking and BMI. Fourth, air pollution estimates, and most other analysis variables are publicly available.

Exposure to PM2.5 air pollution is a risk factor for lung cancer mortality and a possible risk factor in mortality for various cancer types. The results from the current study and comparable studies suggest that PM2.5 may be associated with stomach, colorectal, liver, breast, cervical, and bladder cancer. Interestingly, all these cancers were associated with smoking status in the analysis. Although this exploratory study does not provide definitive conclusions, the strength of the research design and the consistency of results across modeling choices suggest further research is needed into the additional biological pathways by which cancer in humans may be affected by PM2.5 and smoking.. The universal nature of pollution exposure, and its consequences, makes further study essential to public health.