1 Introduction

The integration of laboratory in-vitro/in-vivo studies and epidemiologic data suggests that endocrine disrupting chemicals (EDCs) play a substantial role in the development of many diseases, including reproductive problems, endocrine-related cancers, neurodevelopmental disorders, immune-related diseases, and obesity-related diseases [1]. Epidemiological studies are a logical and necessary complement to in-vitro and in-vivo experimental studies of EDCs in characterizing the nature and magnitude of the risk of EDCs in humans. Actually, observational epidemiologic studies are the only way to observe how actual exposure integrated over various EDCs and over long periods of time relate to disease genesis and progression in the fully complex human body.

At present, speculations and interpretations of harmfulness of EDCs in humans arises mainly from two types of epidemiological evidence: 1) increasing trends of suspected diseases in ecological studies of populations and 2) findings from traditional epidemiological studies of individuals. However, ecological findings are not regarded as direct evidence for the relation of EDCs and human disease, while the evidence from studies of individuals is often inconsistent among studies. Thus, a frequent criticism is that linking EDCs and health in human is naively presumed based on potential mechanisms, not solid evidence [2].

Generally, observational studies of individual people, such as case–control or cohort studies, are considered to be an essential part of the evidence in establishing causality between a risk factor and a disease. Even though all human studies have limitations, however, studies of EDCs have particular unique complications, compared to human studies of other risk factors. This complexity must be carefully considered when we review the literature reporting human epidemiologic studies of EDCs.

Even though classification of EDCs may occur along several axes, such as different aspects of their functionality or biological disturbance, their classification into short vs long half-life is useful for highlighting methodological issues in human studies. Among EDCs, persistent organic pollutants (POPs) and heavy metals have long half-lives (several months to years), while common EDCs like bisphenol A (BPA) or phthalate have short half-lives (several hours to days). In this article, we discuss key issues about EDCs which should be considered to properly design optimal human studies. Addressing key issues at the study design stage can help to arrive at valid results. For some EDCs, however, it may not be possible to optimize human study design due to inherent limitations. Even in this case, understanding methodological issues can be useful to interpretation of findings from human studies.

2 Two types of human studies of EDCs

EDCs are known to alter hormonal and homeostatic systems of living organisms [3]. Thus, in human studies, EDCs should be evaluated from at least two perspectives: “developmental effects” and “disturbance of homeostasis”. The endocrine system is primarily responsible for controlling a number of body functions that start from early developmental processes. Also, it plays an important role in the physiological response to environmental changes with the aim of keeping the organism within the biologic homeostatic space. This classification is useful for a practical approach to EDCs in human studies, even though development and homeostasis are not mutually exclusive (for example, developmental effects can have life-long influences over how the individual responds to various environments as adults).

From the viewpoint of “developmental effects’, exposure to EDCs during critical developmental/growth periods can increase risk for a variety of diseases. EDC exposure during this period can have direct effects on the offspring, as well as impacts much later in life [4]. In this situation, accurate assessment of exposure to certain EDCs during critical periods is important. It has recently been considered that many complex non-communicable diseases typically experienced in adulthood have their origins during development [4]. In this line of thought, such risk can be produced by a variety of environmental factors, including EDCs.

On the other hand, “disturbance of homeostasis” itself is not necessarily harmful because it may or may not result in adverse effects to the organism [5]. In this case, some endocrine disruptions are called “adaptive responses”, rather than “adverse effects”. However, adaptive responses may gradually change over the long term into adverse effects, unless causes for the adaptive responses are properly controlled. Thus, from the viewpoint of “disturbance of homeostasis”, whether the exposure to EDC is chronic or not would be more important, rather than whether the exposure is during a “critical period” or not.

Human research on “developmental effects” is best designed as longitudinal study, starting from pregnancy, with accurate exposure assessment during critical periods. One example, successful from a research perspective, is diethylstilbestrol (DES). Post-pubertal cancer of the female reproductive tract was linked to in-utero exposure to DES [6]. Unlike prescribed drugs like DES, however, accurate assessment of exposure to many EDCs from environmental sources during critical periods would be extremely challenging in humans, for reasons discussed below. On the other hand, human studies focusing on “disturbance of homeostasis” may not necessarily need a cohort starting from pregnancy, but could be performed among children, adolescents, adults, or elderly. However, long-term follow-up with solid outcome data remains necessary because intermediate biomarker-based outcomes can be the result of “adaptive responses”, not “adverse effects”.

3 Exposure assessment: reproducibility

Accurate exposure assessment is a key to estimating valid relative risk estimates in all human studies. In general, the measurement error of the exposure variable substantially influences estimates of association between exposure and disease in human studies. Traditionally, exposure assessment in human studies has been performed largely based on questionnaire. However, EDC exposure by questionnaire is very crude; it is much more effective to measure chemicals or their metabolites in bio-specimens, like tissue, blood, or urine (termed markers of ‘internal dose’). Translation of assessment of the external environment into an internal dose usually has only a very limited value.

There are at least hundreds of chemicals with suspected endocrine activity and repeated measurements through the lifetime should be performed for the best design. Advanced analytic technology for accurate, rapid, and affordable exposure assessment, analogous to chips that can measure a huge array of chemicals at very low concentrations, is often discussed as an issue in urgent need of resolution [7]. However, we need to think about a more fundamental issue: how can we measure internal dose of EDCs reliably?

This answer differs depending on EDC half-life: the longer the EDC half-life, the greater the reliability of an exposure assessment. The measurement of EDCs like POPs may provide a reasonable exposure marker of usual exposure during critical periods. However, EDCs with short half-lives are rapidly metabolized and eliminated through urine, so that there is high temporal variability in internal doses due to changing exposure throughout the day and across days, driven by various environmental factors such as the diet and other lifestyle choices of the person. Human studies of common short half-life EDCs like bisphenol A (BPA), phthalate, or polyaromatic hydrocarbons (PAHs) demonstrate a high within-person variation and a low reproducibility among repeated urine collections from the same person [810].

At present, 24-h urine collection is practically regarded as a gold standard for evaluating usual exposure to chemicals excreted primarily through urine, although many epidemiologic studies have used the first morning urine void because that method has substantial correlation with 24-h urine [11, 12]. However, even 24-h urine samples may not accurately estimate the usual exposure status of short half-lives EDCs in humans because day to day variability of exposure to these EDCs is substantial [810]. It is therefore hard to estimate an internal dose exposure that reflects long term exposure. The typically low reliability of exposure assessment of short half-lives EDCs makes the interpretation of findings on these chemicals from human studies extremely difficult.

Even supposing that we could get exposure levels over days, weeks, and months and that this would lead to the development of an appropriate exposure pattern, exposure patterns may be population-specific. Designs of this exposure pattern would ideally be repeated and validated in targeted study subjects across different populations. Alternatively, measuring a wide range of biological responses due to the exposure to EDCs might prove useful for EDCs with short half-lives; it is the integration with diverse “omics” technology including proteomics or metabolomics [13]. However, relatively low reliability of biomarker assessment remains an issue in such high-throughput molecular “omics” techniques [14] . We note that, unlike the static DNA genome, both chemical exposure and chemical-related biological responses in the human body are all dynamic. Therefore, whatever advanced technology is used, reliable exposure marker assessment is a high priority for successful human studies.

4 Mixtures

Similar to traditional toxicological studies, most epidemiological studies on EDCs have been performed focusing on one chemical in relation to one disease. During the last years, however, in-vitro and in-vivo animal studies have demonstrated cocktail effects of EDCs at levels at which the individual chemicals do not induce observable effects, which has been called the “something from nothing phenomenon” [15]. Experimental studies have been performed focusing on several selected EDCs with similar pathways, such as estrogenic EDCs, anti-androgenic EDCs, or thyroid disrupting chemicals [16]. Even though this is definitely progress compared to single chemical experiments, even such studies are too simple and still provide fragmentary information compared to the actual human situation. Humans are simultaneously exposed to a plethora of diverse mixtures with widely different EDCs in addition to estrogenic, anti-androgenic, and thyroid-disrupting chemicals.

EDCs were first thought to exert actions primarily through direct interaction with nuclear hormone receptors, including acting as agonists or antagonists for estrogen, androgen, progesterone, and thyroid receptors. Today, basic scientific research shows that the mechanisms involve more pathways and more details than was originally recognized [3]. It is now accepted that EDCs can act via non-nuclear steroid membrane receptors as well as non-steroid receptors [3]. In addition, chemicals can act as EDCs by affecting the metabolism of endogenous hormones, such as hormone catabolism [17], even though these chemicals may not be direct agonists or antagonists of any receptor. Thus, what kind of synergistic, additive, or antagonistic actions of EDC mixtures may exist in humans seems to be an unsolvable dilemma.

Some researchers insist on the urgent development of biomarkers that can properly assess the impact of exposure to EDC mixtures in humans [18]. However, even though this kind of biomarker can theoretically be suggested, considering the complexity of the endocrine system, reliable biomarkers which work properly in humans may be difficult to identify. Exposome, the combined lifetime exposures from all environmental sources that reach the internal environment, has also been suggested as a solution to solve the complexity of human exposure to chemicals [19]. However, the exposome may not be a solution for EDC mixtures because other methodological issues concerning research about EDCs are still present even with the exposome approach.

An opposite problem occurs in the case of strongly lipophilic EDCs like POPs, for which the internal body burden of one chemical likely reflects the internal body burden of mixtures of chemicals which move together. That is, serum concentration of the one compound which is the research focus is positively correlated with those of other compounds not considered in data analysis or not even measured. Thus, associational finding for a single compound can actually mark a chemical mixture of POPs. Thus, one specific POP that is associated in a given study may partly or largely reflect the influence of other POPs, even if the specific POP is not causally related to disease. In this sense, a recent meta-analysis or a systemic-review on POPs which focused on individual POPs may be misleading [20, 21].

5 Non-monotonic dose response relation

One key feature of EDCs is the possibility of non-monotonic dose response relationships (NMDRs) [22]. Biochemical, animal, and human studies of EDCs have revealed a large number of NMDRs [23] even though some critics argued that the data were insufficient to conclude that NMDRs are real [24]. In fact, NMDRs may not be exclusive to EDCs and are observed for chemicals that do not act on the endocrine system [25, 26].

Even NMDRs observed in well-controlled in-vitro and in-vivo experimental studies of EDCs, with very simplified and unrealistic conditions, including focus on a single chemical, have led to a lot of debate among experts [2, 27]. We can easily imagine how complicated this issue becomes in observational human studies with exposure to chemical mixtures. If there are true NMDRs between dose of EDCs and adverse effects, the impact of NMDRs on interpretation of human studies is a much more serious design problem than are any other issues in the study of EDCs.

Unlike laboratory studies which can evaluate biological effects across a broad range of doses from 0 to highly toxic levels, the exposure range in a specific human population is limited and unique to that population. It is determined by many socio-economic, political, geographical, and cultural factors; e.g., it is influenced by the extent of industrialization, use of pesticides in agriculture, regulation of chemicals, and dietary patterns. Thus, human studies can observe different shapes of the dose–response relationship depending on the exposure range of study subjects.

Let us examine “an inverted U-shaped relationship” which is one common NDMR which has been shown in study of EDCs. The whole pattern of an inverted U-shaped relationship is observable only when the range of exposure covers the doses in which there is an inverted U-shaped relationship (Fig. 1, population A). Importantly, both the unexposed group and the high exposure group must have sufficient sample size to have resolution power in statistical analyses. However, this situation is almost impossible in the real world. In most situations, study subjects in one specific population have an exposure range which corresponds to a certain part of “an inverted U-shaped relationship”. Thus, positive, inverse, and null associations are all possible depending on the exposure distribution of the population under study (Fig. 1, populations B, C, and D). Importantly, as populations have an exposure range closer to zero exposure, associations become more strongly positive, while under this underlying NMDR shape, associations get closer to inverse as the exposure range shifts away from zero exposure. Notably, the situation we have described above is unrealistically simplified because only one chemical is present. The situation of chemical mixtures with various NMDR distributions, frankly speaking, may be beyond our imagination.

Fig. 1
figure 1

Unlike experimental studies which can impose a range of exposure doses, exposure ranges in human populations are limited and population-specific. Under an inverted U-shaped risk association (one of the common non-monotonic dose response relationships (NMDRs)), only data from population A in which individual population members span a whole range of exposure will show the inverted U-shaped association. The shape of the dose–response curve differs depending on the different distributions of chemicals across populations. The thickened risk curves isolate sections of the risk curve which are likely to be seen in epidemiologic studies of populations with more limited exposure ranges. Data from population B will show a strong positive association, data from population C will show a null association, and data from population D will show an inverse association

At present, a first and common approach to evaluate possible harmful effects of chemicals in humans is to study workers who are exposed to high levels of this chemical and who are compared with a general population. If this study fails to show any possible harm, researchers naively assume the chemical is safe in general populations because the general population is exposed to much lower levels of this chemical than occupational workers. However, under NMDRs, this approach is not valid. When NMDRs are suspected, the first evaluation should be done among general populations.

In fact, many human studies of POPs show more consistent results among general populations with background low dose exposure, compared to studies performed among people exposed to high levels of pollution or among occupational exposed workers. This situation was observed with the outcomes of diabetes [28] and cancer [29]. Under the current paradigm, chemicals showing these findings are classified as presenting weak human evidence even though there is strong the in-vitro or in-vivo evidence [29]. However, this pattern of epidemiologic findings is always possible under NMDRs.

6 No control group

Due to the ubiquity of many EDCs in the environment and food web, finding control groups in human studies poses an important methodological problem. It is often impossible to find negative controls, i.e. subjects who have not been or are not exposed to EDCs. If the dose–response relation is linear and there is a threshold dose below which there are zero effects, non-existence of a truly unexposed group is not an issue as long as there are subjects with concentrations in the very low risk area of the linear association or under the threshold (Fig. 2a). However, when there is a linear dose–response relation without a threshold, as mean exposure levels of the reference group increase, the underestimation of relative risk become greater (Fig. 2b). Under NMDRs, non-existence of an unexposed group seriously affects the estimation of relative risk (Fig. 2c).

Fig. 2
figure 2

Illustration of the effect of lack of an unexposed group on relative risk estimation depending on the shape of dose–response relations. When there is a linear association with a threshold, if the researcher takes exposure level “low” as “reference group”, the relative risk is unbiased. When subjects with exposure levels “low” or “middle” are mixed in the “reference group”, the relative risk is underestimated. When there is a linear association without a threshold, as mean exposure levels of the reference group increase, the underestimation of relative risk become greater. Under a non-monotonic dose–response relation, non-existence of an unexposed group seriously affects the estimation of relative risk

To date, whether EDCs have a threshold or not is an issue of serious debate among laboratory researchers [5, 30]. Whatever is true, however, the possibility of NMDRs requires a stable number of study subjects in the reference category as close as to unexposed as possible. To get the least biased relative risk estimates, even among general populations, populations with a very low dose range of exposure to this chemical should be selected, not populations with a relatively higher dose range (such populations are not likely to have a valid reference group which is close to unexposed).

7 Exposure assessment: critical life time

Ideally, the comprehensive measurement of different exposure patterns during a lifetime is needed, especially for human research on “developmental effects” of EDCs, because the endocrine system responds differently at different developmental stages and ages. However, this scenario is practically unrealistic. Researchers therefore suggest several snapshot measurements during critical life stages, such as fetal development, early childhood and the reproductive years [31]. Despite this compromise, precise estimates of the exposure together with the identification of the critical stages for a particular EDC or combined exposure to multiple EDCs are major challenges in humans.

In fact, the most critical period may differ depending on health effects of EDCs; for example, the timing of organ development varies during the fetal period. Thus, in an ideal situation, suspected EDCs disturbing neurodevelopment need to be measured at different time points from suspected EDCs affecting homeostatic metabolic set-points. This issue still remains after birth because the endocrine system is dependent on the circadian rhythm and menstruation cycle [32] and hormone-related effects of EDCs likely differ according to the endogenous hormonal milieu [33].

8 Interactions with established risk factors

Health effects currently attributed to EDC exposure are often multifactorial. In epidemiological studies, adjustment for confounders is a key step to get less biased relative risk estimates. However, valid adjustment may be challenging when the exposure to EDCs is closely associated with many established risk factors. For example, food is directly related to the exposure to many EDCs. Fatty animal food is one of the main exposure sources of lipophilic EDCs. EDCs like BPA or phthalate leach from food and beverage containers. Pesticides residues in food and beverages also enter the human body. Furthermore, adverse effects of some EDCs tend to be observed among experimental animals consuming a high fat diet, but not a usual fat diet, suggesting biological interactions [3436].

Another important risk factor is obesity. Many EDCs are lipophilic and accumulate in adipose tissue. However, these chemicals continuously release from adipose tissue through normal lipid metabolism and insulin resistance related to obesity increases the release of these chemicals into the circulation [28]. Also, weight loss and weight gain affect the release and restoration of these chemicals [37].

As both diet and obesity are related to the risk of many chronic diseases, how to handle these established risk factors is a challenge in human studies of EDCs. Both not adjusting (failure to remove bias) and adjusting (improper model which ignores interaction) can be problematic. At least, several models should be considered in interpreting findings in human studies on EDCs. Additionally, many other risk factors such as socio-economic status, exercise, cigarette smoking, and alcohol drinking can be directly or indirectly related to the exposure to EDCs.

A further complicated issue is that many plant foods contain hormonally active substances; for example, isoflavones such as genistein possess powerful estrogenic activity in screening assays. However, unlike man-made EDCs, intake of plant food is related to lower risk of many diseases, including some hormone-related cancers. Even though risk effects of whole food would be different from the effects of a specific constituent, phytoestrogens can affect the exposure assessment if a bioassay for estrogenic activity is used to estimate total estrogen burden in the human body.

9 Other mechanisms related to low dose EDC mixtures

Researchers should consider that low dose chemical mixtures can affect human health through other mechanisms not directly related to traditional endocrine disruption, even though those chemicals are correctly classified as EDCs. One example is POPs. Even though POPs have recently been related to several chronic diseases in human populations with substantial consistency [28, 38], interpretation of POPs as EDCs does not completely explain these findings because they include a variety of compounds with diverse endocrine disrupting properties as chemical mixtures. For example, DDT is known to be an estrogen agonist [39] while DDE, the main metabolite of DDT, has anti-androgenic properties [40]. PCBs consist of mixtures of congeners with estrogenic and anti-estrogenic effects [41, 42], and some PCBs affect thyroid hormone signaling [43]. POPs with dioxin activity can indirectly influence some estrogen-mediated endpoints [44].

Humans are simultaneously exposed to mixtures of POPs consisting of many chemicals with different endocrine properties. Even though one specific EDC which is a POP is consistently and significantly associated with disease over several human studies, the inference that a unique endocrine disrupting property of this compound causally explains the findings may not be true. In particular, if the distribution of other correlated compounds with similar or opposite EDC properties differs among populations (which is common in the real world), we need to consider the possibility of alternative mechanisms.

In fact, we have suggested that the continuous consumption of intracellular glutathione through conjugation and mitochondrial dysfunction due to the chronic exposure to low dose POPs mixtures can lead to a variety of human chronic diseases [28]. Even though it is extremely difficult to predict mixture effects of diverse EDCs in humans, the glutathione consumption during metabolism is a general pathway potentially affecting any chemical that is metabolized by glutathione conjugation. Thus, one POP compound used as a surrogate marker for POP mixtures can be consistently related with outcomes in human studies. Importantly, as glutathione depletion and mitochondrial dysfunction with low dose POPs can be compensated through hormetic effects of these chemicals which can be observed within the range of slightly increased doses, this mechanism can also follow NMDRs [45].

10 Conclusion

Bradford Hill’s criteria, including consistency of findings across studies and dose–response have long been used to objectively evaluate whether associations observed in epidemiology can be interpreted as identifying causal mechanisms. Reviews of EDCs which concluded that there are significant adverse effects on human health and the environment [1] are often criticized for not adopting Bradford Hill’s approach [2]. However, as we discussed above and others have discussed [46], there are challenges to the use of these criteria for EDCs to establish causality, specifically that the nature of the risk relationships with EDCs and study designs may often lead to inconsistent findings and that NMDRs may be the true state of affairs.

In human studies, the most plausible scenario would be that harm due to EDC mixtures would be revealed as an increasing trend of related diseases at the population (ecologic) level. However, for all the reasons discussed above, it might be difficult to observe consistent associations with specific EDCs, especially those with short-half lives, in traditional epidemiological studies of individuals. Epidemiologists commonly interpret this phenomenon as bias, reflecting the “ecologic fallacy”. The ecologic fallacy is commonly taken to mean that results based on individual levels reflect true association, while result based on population levels reflect artifact. However, in the case of EDCs, results observed in ecologic designs may actually reflect the true association.

Considering all the complicated aspects of EDCs, we are respectful of the difficulty of evaluating the role of chemical mixtures such as EDCs on the development of various diseases in epidemiological studies like cohort studies, even if conditions like a large sample size, a long follow-up from pregnancy to adults, technology advancement, and state-of-art statistics are fulfilled. However, as there are the uncertainties surrounding the effects of EDCs on human health and limitations of extrapolation from in-vitro and in-vivo experimental findings to the in vivo human situation, the conduct of epidemiological studies, despite their inherent challenges, remains an essential component of the evaluation of possible human effects of EDCs.

Therefore, several practical approaches can be proposed for future epidemiological studies on EDCs, First, compared to EDCs with short half-lives with low reliability as an exposure marker, the evaluation of health effects of EDCs with long half-lives in human studies can provide more reliable results, because the measurement of an EDC with a long half-life is more likely to represent long term exposure. Second, even in this case, such EDCs should be evaluated among populations with a relatively lower dose range of exposure to have a stable number of study subjects in the reference category as close to an unexposed as possible. Third, as human studies of EDCs usually include simultaneous measurement of many chemicals and they may be repeatedly measured during follow-up periods, the information on exposure should be integrated under the predefined biological hypothesis. Otherwise they can produce many chance findings which may not be reproducible in future studies. Fourth, replication of findings on EDCs in independent datasets may be necessary to validate an association finding, similar to genome-wide association studies (GWAS). It is well accepted that GWAS requires replication of findings in at least two independent datasets as part of validating an association. Compared to GWAS which typically looks at millions of comparisons, the number of comparison in studies of EDCs would be much lower. However, unlike genome assessment of GWAS, in which DNA variants are assessed with almost 100 % accuracy, the low reliability of exposure assessment of EDCs increases the probability of chance findings. Thus, replication in independent datasets may be as important in studies of EDCs as in GWAS.

Epidemiology of infectious diseases is different from epidemiology of chronic diseases. Similarly, design principles and interpretation for human studies of EDCs may not be identical to design principles and interpretation under the paradigm of epidemiological studies of traditional risk factors. Thus studies of EDCs may require some new epidemiologic concepts. As the evaluation of health effects of environmental exposure to EDCs is becoming more important, key methodological issues in human studies should be recognized by researchers, clinicians, and policy makers, to avoid incorrect inferences.