Introduction

Individuals are exposed to multiple environmental chemicals (both natural and synthetic) via different environmental media such as air, water, and soil [1]. For instance, studies find that US pregnant women are exposed to multiple chemicals including polychlorinated biphenyls (PCBs), organochlorine pesticides, perfluoroalkyl and polyfluoroalkyl substances (PFAS), phenols, polybrominated diphenyl ethers (PBDEs), phthalates, polycyclic aromatic hydrocarbons (PAHs), and perchlorate [2]. Human biomonitoring—measuring the concentration of chemicals in body fluids (blood, urine, and breast milk) or tissues (hair, nails, fat, and bone)—is often used to assess chemical burden as it provides evaluation of the internal doses reflective of exposures via multiple pathways [3]. Since 2000, the US Center for Disease Control and Prevention (CDC) has been biomonitoring about 300 chemicals, using the National Health and Nutrition Examination Survey (NHANES). These chemicals include metals, pesticides, PCBs, PBDEs, PFAS, volatile organic compounds (VOCs), tobacco smoke, PAH metabolites, and phthalate and metabolites [4]. Certain classes are frequently measured simultaneously and in a single matrix (maternal urine or maternal serum): non-persistent phenols and phthalates are often examined in urine while persistent chemicals such as PFAS, PBDEs, PCBs, and organochlorine pesticides are commonly measured in serum. Multiple chemical exposures can result in higher risks than exposures to individual pollutants. The NAS concluded that “combinations of phthalates and of other antiandrogens generate combined effect at doses that when administered alone do not have significant effects” and recommend that risk assessment should account for cumulative risk to chemicals that affect the same adverse health endpoint [5].

The concept of cumulative impacts, the focus of this review, refers to potential adverse human health effects resulting from combined exposures to multiple environmental and social stressors [6••, 7, 8]. “Cumulative risk” aims to quantify to the extent possible “combined risks from aggregate exposures to multiple (environmental) agents or stressors” [1]. Statistical methods used to characterize and model the combined and potential interactive effects of multiple environmental hazards and social stressor exposures are referred to as cumulative risk and impact modeling.

Recognizing the public health impacts of exposure to multiple environmental chemicals, an increasing number of studies have assessed the cumulative impact of exposures to chemical mixtures or multiple air pollutants simultaneously. Dose-addition-based methods including relative potency factors and toxic equivalency factors have been used to examine the cumulative risk of chemicals from a single class such as organophosphate pesticides [9], PCBs [10], and phthalates [11].

In addition to environmental chemical pollutants, non-chemical stressors—particularly psychological and social stressors (e.g., such as poverty, lack of social support, and chronic discrimination due to race/ethnicity)—can independently influence health and have been considered in cumulative risk and cumulative impact studies [12]. Psychological stress and socioeconomic factors in recent years have been identified as critical non-chemical stressors that could increase the adverse health effects of chemical exposures. For example, biomarkers of chronic stress response, such as “allostatic load,” were found to amplify the risk of increased blood pressure associated with lead exposure in adults [13]. Urban children exposed to violence had higher risks of developing asthma in the presence of traffic-related air pollution [14]. Social stressors, measured by indicators such as poverty and race/ethnicity, have been often included as one of the key effect modifiers in environmental health research to address disparities [15,16,17,18,19,20,21,22]. Social stressors such as educational attainment level and population density were examined previously as well [20]. To evaluate cumulative health risks from both chemical and non-chemical stressors, US EPA proposed a framework guidance for cumulative risk assessment in 2003 [1] and subsequently provided a technical resource document in 2007 [23], acknowledging the challenges of incorporating non-chemical stressors in risk assessment. Although it is known that humans are exposed to multiple chemical and social stressors which are likely to cumulatively impact health, cumulative risk and impact modeling methods have not yet been fully developed to evaluate the joint exposures.

Overall, the main categories of established approaches used to evaluate aspects of cumulative impacts of multiple stressors are either quantitative or semi-quantitative methods, including biomonitoring, health risk assessment, ecological risk assessment, health impact assessment, burden of disease, and mapping of cumulative impacts [24•]. The majority of the established approaches involve elements of quantitative analysis, but these vary to a large degree [24•]. For example, when little or no mechanistic data are available, the hazard index was used to assess the cumulative non-cancer risks for chemicals that have an established chronic reference dose or reference concentration [25]. More established modeling methods have been applied in recent years to attempt to address cumulative impacts. Air dispersion and exposure models were employed to examine cumulative diesel particulate matter emission in Southern California for five traffic/mobile sources comparing four environmental goals including impact, efficiency, quality, and justice [26]. To account for non-chemical stressors in the process of exposure and dose estimates, the average daily dose model [27] was linked to multiple social indicators and applied to examine dose estimates on both the US nationwide census tract-level and community-wise local scale [28]. In addition, association rule mining [29], an unsupervised machine learning method, was also utilized to evaluate associations between social factors and environmental chemical concentrations relevant to cumulative impacts in the USA [30].

To our knowledge, no review has yet been performed to identify the statistical models used to evaluate the combined effects of multiple factors of environmental chemical and social stressors. A review of cumulative risk and impact modeling techniques can fill this scientific gap and provide useful modeling reference for future cumulative risk and impact studies. This review identifies and evaluates the types of statistical models used to quantify the cumulative effects of multiple environmental and social stressors to provide modeling suggestions. The purpose of this review is not to provide a framework for modeling selection in designing a scientific study, but to summarize what modeling techniques have been considered after research questions, study design, and data were determined. Many factors including research questions, study design, and data availability are important in model selection. However, there are often multiple choices for statistical modeling and there may be opportunities to bring in new types of models into the field to evaluate cumulative risk and impacts. This review was conducted from this perspective.

Methods

Given the diversity of chemical and non-chemical factors involved in cumulative impacts and their various possible combinations and relevant health effects, different studies used a variety of statistical approaches to model and evaluate the health effects from multiple stressors. The two major categories of statistical models are supervised and unsupervised modeling methods. The former predefines response and explanatory variables and evaluates their statistical relationships, while the latter has no such predetermined condition but instead examines and identifies potential associations or hidden statistical structure among different input variables. Supervised methods include both regression models (e.g., Cox’s regression model [31]) and classification models (e.g., classification and regression trees [32]). Unsupervised approaches encompass cluster analysis [33] and association rule mining or frequent item set mining.

In this study, we review the statistical models used in studies whose primary objective was to analyze chemical and non-chemical stressors collectively. Many exposure studies have varying interpretations of the concept of “environment”—for instance, characterizing exposures in the home, work, or neighborhood environments. Similar to the definition presented in a previous review [34], the universe of exogenous chemical exposures in this review is referred to “those that are generally addressed by US EPA, and include manufactured chemicals and chemical byproducts (e.g., air pollution)” [34] except smoking. In this review, we did not evaluate studies that were specific to home or work environments. Also, no restriction was imposed upon our search based on the type of data used.

We also utilized searching terms related to environmental justice to broadly capture articles that evaluated the health effects of multiple chemical and non-chemical stressors in a cumulative manner, in that, many environmental justice studies emphasized the combined effects of multiple stressors. However, environmental justice is not the main focus of this review.

We searched articles published from 01 January 2012 to 21 June 2017 in English and indexed in PubMed with the following four groups of searching terms and identified articles that had to meet each of the four criteria:

  • #1 (cumulative, multiple, aggregated, joint, combined) AND (risk, impact, exposure)

  • #2 (Environmental) AND (justice, injustice, equality, inequality, equity, inequity, disparity)

  • #3 Environmental/Chemical exposure*

  • #4 Non-chemical stressors*

*Search terms for #3 and #4 were adopted from protocols developed by [34] that evaluated the combined effects of prenatal exposure to both chemicals and psychological stress on fetal growth.

Inclusion Criteria

  1. I.

    Original peer-reviewed research articles that evaluate both environmental and social stressors, and analyzed their health effects (excluding home or work environment)

  2. II.

    Human subject studies

  3. III.

    Articles published during 2012/01/01 and 2017/06/21

  4. IV.

    Articles that included quantitative method information

One reviewer (HH) was responsible for screening and identifying relevant studies.

Results

HH identified 376 articles with full text availability and found 79 eligible articles based on initial title and abstract screening. After full-text review of these eligible articles, HH identified 31 relevant articles. The excluded 345 references consisted of (1) articles that did not involve both social stressors and environmental/chemical exposures (n = 241), (2) studies that did not analyze health effects of multiple stressors (n = 69), (3) non-original research articles (n = 24), (4) articles that did not use quantitative methods (n = 9), and (5) studies that were not based on human subjects (n = 2).

Supervised methods were divided into regression and classification (Fig. 1). Currently, most of the modeling techniques utilized to examine cumulative impact are supervised regression models. We considered commonly used regression methods such as multivariable linear/non-linear regression and logistic regression models as simple regression techniques. Other regression models, such as generalized linear model (GLM), multilevel model, and spatial regression model, were classified as complex regression techniques. None of the studies identified in this review used supervised classification models.

Fig. 1
figure 1

Different types of existing statistical models. Most of the current models used to capture the combined effects of multiple chemical/non-chemical stressors in the field of cumulative impact studies are regression models (highlighted in gray shadows)

As shown in Table 1, among the 31 articles identified [35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70], 10 studies [35,36,37,38,39,40,41,42,43,44] used multivariable linear/non-linear regression models to evaluate the combined effects of multiple chemical and non-chemical stressors, and 7 studies [43, 45,46,47,48,49,50] used logistic regression models. In addition, we found 5 studies [51,52,53,54,55] that used hierarchical/multilevel regression models. All studies used supervised techniques, specifically regression models, but several of them also used unsupervised methods such as hierarchical cluster analysis, factor analysis, and principal component analysis (PCA), in addition to regression models.

Table 1 Summary of studies (n = 31) included from this literature review that quantitatively evaluate combined effects of both chemical and social stressors on health

Air pollutants were the environmental chemical exposure most often modeled: 20 out of the 31 studies evaluated air pollutants [36, 37, 39, 41, 42, 44, 46,47,48, 51, 52, 54, 57,58,59, 61,62,63,64,65], especially the criteria pollutants such as particulate matter (PM) and nitrogen dioxide (NO2). For instance, the joint effects of exposure to PM2.5 and O3 and socioeconomic status measures upon pregnancy outcomes including low birth weight, preterm birth, and small for gestational age were evaluated based on a linear or logistic-mixed regression model [51]. Other chemical exposures evaluated cumulatively include industrial cadmium [40], lead [38], zinc [55], bisphenol A (BPA) [49], and PAH [43]. Socioeconomic factors, especially race/ethnicity and income level, were among the most frequent non-chemical stressors modeled. As to the health endpoints evaluated, mortality rate and cancer risks followed by pregnancy outcomes were considered more frequently than others (7 and 5 out of 31) (Table 1).

We evaluated in more detail the statistical modeling technique and combination of exposures and outcomes identified in Table 1 to provide examples of model selection given different stressors focused (Table 2). We found that some complex regression-based modeling approaches have been used to evaluate the joint effects of multiple stressors. For example, the combined effects of exposure to arsenic contamination in drinking water and health intervention programs on child mortality from acute lower respiratory infections were modeled by a zero-inflated negative binomial regression [56]. Both simple regression and Bayesian sparse spatial multilevel models were utilized to evaluate the relationship between lead exposure and both gonorrhea and chlamydia, accounting for other non-chemical stressors such as index of concentrated disadvantage [53]. Negative binomial regression models were used to examine associations between mortality and environmental factors including air pollution and drinking water quality with consideration to socioeconomic deprivation [57].

Table 2 Sample studies included that quantitatively evaluated both chemical and social stressors on health

Discussion

Our review provided a summary of statistical modeling methods considered in studies to quantify potential combined effects of multiple environmental and social stressors. It was not our intent to provide a framework for modeling selection in designing a scientific study, which is beyond the scope of our review. The selection of modeling technique involves consideration of many important factors, including the research question(s), study design, and data availability. The combination of these factors may lead an investigator to choose one method over another.

For example, to answer respiratory health inequality questions concerning relationships between respiratory health situations across different cities and their medical amenities, socioeconomic, and physical features (e.g., air pollution), Aschan-Leygonie et al. analyzed health data describing hospitalizations of chronic obstructive pulmonary disease and a large set of different indicator variables of both social and environmental stressors using linear correlations and multiple linear regression models in an ecological case study [35]. They found that socioeconomic features may be the major drivers for inequities of respiratory health status in urban units and concluded that better understanding of “differences among cities in their entirety” is essential to develop effective urban policies. Multiple linear regression models were an appropriate choice to answer the research questions of interest given their study design and data availability. However, if the intent of the study was not to understand risk factors for a specific type of disease or outcome, but instead to identify all the possible associations among the variables with different combination to provide guidance for further analysis, then the association rule mining method could have been more useful in that setting, assuming the modeling assumptions were met and data requirements satisfied.

We found that a large number of more complex statistical models, supervised or unsupervised, have been utilized in other scientific domains but less commonly applied in cumulative impact studies. A relevant example is the random forest model [71] that is a supervised classification tool widely employed in various applications such as compound classification and quantitative structure-activity relationship (QSAR) modeling for predicting categorical biological activity [72], land-cover classification [73], and gene selection and classification [74], but was not used to understand the joint effects of multiple stressors. When multiple exposures including social stressors were considered, random forest model can be rather useful to determine variable importance and potentially separate those of more importance from a larger set of variables. Another supervised technique example is neural network ensembles [75], which serves as the foundation of deep learning [76] and has also been extensively employed in numerous scientific disciplines [77,78,79,80,81], but has not been used in cumulative impact studies. Provided that there are known associations between certain health outcome and multiple stressors, these types of models can be very powerful in predicting occurrences of the health outcome of interest within the context of examining multiple exposures.

Model Comparison

Simple Regression vs. Complex Regression

The advantages of simple regression models include (1) straightforward model execution, (2) easy interpretation of model techniques, and (3) model outputs accessible and understandable to a larger group of audiences, which can be conducive to risk communication and community engagement. For example, Vishnevetsky et al. evaluated cumulative effects of low socioeconomic status and PAH air pollution exposure in children. They found that children with socioeconomic disadvantages as measured by recurrent material hardship and high level of prenatal exposure to PAH had 5.81 points lower full scale intellectual quotient (IQ) score than those experienced material hardship but with low PAH exposure. The same significant association was not observed within the low material hardship group [43]. The numeric construct of the findings is informative and understandable to public audiences and will benefit future community-based risk assessment and communication.

One of the challenges of using simple regression models is that normal or multi-normal distribution assumptions may not always hold for environmental exposures or social factors. Although in some cases log-normal transformation of response variables can address this problem, it cannot be used to account for data that has other types of distributions, such as negative binomial distribution, Poisson distribution, and gamma distribution.

Complex regression models have the following benefits: (1) higher level of flexibility regarding distribution assumptions made, (2) the ability to account for inherent data issues (e.g., spatial autocorrelation), and (3) potentially better predictive power. For instance, GLM allows flexibility regarding data distribution assumption which is the advantage of this kind of modeling [33], but similar to other supervised methods, it also requires specification of both response and explanatory variables.

When modeling the joint effects of multiple environmental chemical and social stressors, using multivariable linear regression or logistic regression models may not be appropriate, especially if the social stressors are based on place-based measurement, and these can be spatially autocorrelated which can introduce biased estimates. In these cases, more complex regression modeling methods such as spatial error regression models [82] can be useful in addressing such issue, because they do not assume independent and identically distributed errors at the census tract level, but rather allow errors distributed by a spatial autoregressive process. This type of model can account for residual spatial autocorrelation when units of observation are located proximally, and thus non-independently, in space. For example, simultaneous autoregressive models were utilized to examine the health impacts of NO2 and several community-level social stressors such as violent crime and physical disorder, crowding, and poor access to resources across New York communities, accounting for spatial relationship between air pollution and social stressors [59].

Simple regression models such as multivariable linear or logistic regression models are special cases of GLM. In this review, we distinguish between multivariable linear or logistic regression and other uses of GLM models beyond linear/logistic regression models. Multilevel/hierarchical modeling [83] is another example of a modeling approach that permits examination of the effects of stimulus variables upon response variable on the local vs. global scale, accounting for variance among variables at different levels, but it requires a sufficient sample size for unbiased estimation [84]. In this review, we found that multilevel models were a popular option for modeling cumulative effects of various stressors that have nested effects and frequently considered in longitudinal studies. For example, association between PM2.5 and O3, socioeconomic factors, and birth outcomes were modeled using North Carolina birth data from 2002 to 2006 and multilevel models in which census tract was specified as a random effect to account for neighborhood-level correlation [51].

Although unsupervised methods such as PCA was employed occasionally, coupled with other supervised techniques (e.g., generalized estimating equations, zero-inflated negative binomial regression), other complex unsupervised machine learning methods have not yet been explored. In recent years, association rule mining modeling was used to identify and prioritize relations between environmental stressors and negative human health effects [85] and discern prevalent chemical combinations in the US population [86]. Provided that population-based health outcome information is available on a small geographic unit such as census tract and can be linked to social and environmental data, this unsupervised model can also be applied to evaluate the synergistic health impacts of multiple chemical and non-chemical stressors in the future.

It should be acknowledged that using complex data mining techniques is not necessarily always a better option. For example, more complex modeling methods may have higher data input requirement such as larger sample size to have enough statistical power for generating reliable results. However, with proper study design and data availability, outputs from certain data mining techniques can be useful to address certain research questions, provided proper assumptions are met.

Bayesian vs. Non-Bayesian

We also found that several studies adopted a Bayesian approach to examine combined effects of multiple stressors. The advantage of Bayesian methods is that they can incorporate qualitative information or a priori knowledge to improve model fitting and predictions. Incorporating feedback from local residents or experts is a critical component in local scale cumulative risk assessment, and Bayesian statistical models could play a key role in connecting qualitative information to quantitative calculation. However, both the amount of non-quantitative data needed to integrate into the model and the degree to which information will be applied are subjective, which may potentially introduce bias. Non-Bayesian approaches predominantly utilize quantitative information without qualitative inputs, and therefore could avoid subjective bias embedded in qualitative data. However, these approaches have less flexibility in integrating non-quantitative information such as expert opinions that could be potentially useful in situations where quantitative data alone are not sufficient for conclusive analysis.

Current and Emerging Exposures

More than two thirds of the studies identified focused on air pollutants, which is largely driven by data availability and response to policies developed as part of implementation of the Clean Air Act and subsequent regulations. Importantly, spatial studies, such as those that use air pollution, make it feasible to analyze place-based exposures to environmental exposure and social stressors because these data can be accessed via publicly available data sources, such as the National Air Toxics Assessment database (https://www.epa.gov/national-air-toxics-assessment/2011-nata-assessment-results). This facilitates data analysis with a larger geographic and population scope, which provides sufficient power to observe signals for multiple types of groups, each of which may be relatively small. There were fewer studies that used biomonitoring data, in part because this type of data is less available, though more is being generated with investments from the research community and government. Thus, studies that characterize individual-level exposures to both multiple environmental chemicals (using targeted and non-targeted approaches) and social stressors (biomarkers of chronic stress response, perceptual, and place-based stressors) are also becoming more viable.

There exist many emerging exposures that warrant researchers’ and policy-makers’ attention such as heat exposure [87], multimedia screen light exposure [88], nano-material exposure [89], chemicals not bio-monitored previously [4, 90], and poor access to resources [59]. Analyses based on place-based measures allow researchers to take advantage of data sets that are likely to have wide variability in exposures to chemical and non-chemical stressors, and facilitate research with a wide geographic scope. However, it can narrow the scope of the kinds of exposures that one can analyze due to constraints of data availability. This makes it more challenging to evaluate these emerging exposures. Provided relevant data sets are available, future research can build upon the methods and findings from spatial studies and apply them to evaluating combined effects of some of these emerging exposures with known pollutants, to better characterize the effects of multiple chemical exposures that individuals experience [90] along with social stressors.

Limitations

There are several limitations of this review that warrant further consideration. First, this review mainly focused on the modeling aspects of research studies and less so regarding specific research questions evaluated, study design, and data available. However, due to the high degree of dimensions regarding different possible research settings that may call for use of distinct modeling methods, covering all the combinations of different research questions, study designs and data, and then proposing modeling suggestions accordingly is beyond scope of this review. As a starting point towards promoting use of appropriate statistical methods in examining cumulative risk, we focus on the modeling perspective. Therefore, this manuscript reviews statistical models used to examine the combined effects of both environmental chemical and social stressors in recent studies. Lastly, although our intent was to focus on studies whose primary objective was to investigate cumulative exposure, our title and abstract searching mechanism might have missed studies that found no positive results regarding the combined effects of multiple environmental chemicals and social stressors. Such negative findings would be important in a systematic review and meta-analysis. However, it should be recognized that this review is not a systematic review and no quantitative synthesis was performed across different studies. Therefore, we may have potentially excluded references containing other useful data mining techniques not mentioned in this review, but we estimate that the negative consequence is not substantial.

Regression models have been often applied to evaluate the potential adverse human health effects from combined exposures to multiple environmental chemicals and social stressors. With proper study design and appropriate modeling assumptions, additional data mining methods may be useful in the evaluation of cumulative health impacts of multiple chemical exposures and social stressors.

Conclusion

The importance of understanding joint effects of environmental chemical and social stressors has been recognized. There is growing literature to evaluate the combined effects of multiple stressors on health with the majority of them using regression models. With increasing knowledge in exposure science and the advent of more quantitative tools in the era of “big data,” we recommend that additional data mining techniques are considered in certain appropriate research settings and potentially incorporated in the analytical procedure to better characterize chemical and non-chemical stressors for risk assessment to identify potential health risks and to provide public health protection, particularly to the vulnerable and susceptible populations.