Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Study Types and Evaluation Principles

Although human studies, epidemiological studies (see chapter “Epidemiological Methods in Regulatory Toxicology”) in particular, would be the gold standard for the risk assessment of compounds to which humans are exposed, those studies are almost always of observational nature with retrospective elements and confounded by other risk factors (e.g., personal, behavioral, and environmental characteristics, co-exposure to other agents) and background exposure. Therefore, specialized statistical and epidemiological methods are required to analyze these data. It should be noted that the most valuable human data are often obtained from highly exposed populations (e.g., occupational cohorts) and do neither cover dose ranges relevant for regulatory practice and such they need also the extrapolation to effects at low doses. In contrast to human data of high variability and heterogeneity are data from studies in usually inbred strains of experimental animals which exhibit very low heterogeneity and moderate variability. Furthermore, confounding can be efficiently controlled by prospective and randomized designs. Therefore, animal studies have been considered as gold standard for human risk assessment as well, even when two steps of extrapolation – from high to low doses and from animals to humans – are required.

Although the nature of statistical methods is general enough to be applied to both carcinogenic and noncarcinogenic data, the statistical methods for risk extrapolation must account for a risk management principle, e.g., the biologically based paradigm that genotoxic and/or directly DNA reactive carcinogens would not allow, assuming the existence of a threshold exposure level below which no biological effect is possible. Even when the existence of a threshold could be assumed for noncarcinogenic compounds or carcinogens which do not directly react with DNA, estimating that threshold dose would require the use of statistical methods and in most cases extrapolation methods as well since that dose may also range in a low-dose region.

A Road Map for Extrapolation

Risk extrapolation of both carcinogenic and noncarcinogenic compounds is preferably performed in a carefully planned investigation which should account for a number of critical check points listed in Table 1 as sort of road map. Working through these points cannot be without considering the resources available for the assessment (e.g., available scientist and their profile of expertise, access to data, computational facilities including software) and the time frame for delivering the low dose extrapolation results. It should also be noted that this checklist may be applied iteratively for refining the assessment process.

Table 1 Road map and checklist for extrapolation

Choice of Risk Parameters

The critical effects which define the risk parameter for extrapolation should have been identified in an earlier step of risk assessment (“hazard identification”) as adverse effects which are potentially relevant for risk characterization and which can be assessed quantitatively for extrapolation from high to low doses. Methodological statistical considerations differentiate between three major classes of data types which express increasing statistical (not necessarily biological) content of information:

  • Quantal (e.g., the occurrence of a defined illness)

  • Categorical-ordinal (e.g., severity of allergies)

  • Quantitative-metric (e.g., concentration of a liver enzyme).

Carcinogenic effects seen in animal studies usually fall into the class of quantal data, since the occurrence of cancer (cancer incidence) and death from cancer (cancer mortality) are the relevant endpoints for human cancer risk assessment. Both are still considered as the most relevant indices for cancer risk assessment and to control cancer disease in a population. For time-to-tumor data, both the biological database and the statistical tools available are still not well developed. In contrast to the quantal data describing carcinogenic effects, the assessment of noncarcinogenic effects is much more diverse and needs special considerations for selecting the relevant adverse events and identifying the parameters which describe these effects best. On the other hand, the database for noncarcinogenic endpoints is often richer, and there are often quantitative data available which allow powerful dose–response analysis with smaller numbers of subjects. Data of the type categorical-ordinal are rarely analyzed for extrapolation purposes and require in general more specialized methods.

Choice of Risk Measures

Based on the critical effect which could be a disease incidence or the change of a quantitative marker of a health effect (e.g., beta-2-microglobulin, a biomarker of renal tubular effects), a quantitative risk measure R must be defined, which describes the risk as a mathematical function R(d) of the exposure dose, denoted d. In animal experiments the dose is usually expressed in units of mg/kg body weight administered per day. Alternatively one may formulate the risk measure also in terms of the concentration of the substance, e.g., in a target organ (e.g., blood, liver, kidney).

In the case of quantal data, R(d) expresses the probability of the occurrence of the critical effect in the subject of investigation exposed to dose d:

$$ R(d) = P\;(Effect\;|\;Dose = d). $$

The symbol P stands for probability (unfortunately, sometimes also denoted as risk). For many compounds one must assume that there exists background exposure, either from exogenous or endogenous origin that adds to the total exposure (total exposure = background exposure + exposure through administered dose = d). Denoting the risk due to background by R 0 = R(0), one may distinguish between additional and extra risk:

  • Additional/added risk (above background): \( { R}_{{ Add}}^{*} = { R}({ d})-{{{ R}}_0}. \)

  • Extra risk (of the substance): \( { R}_{{ EXCESS}}^{*} = \frac{{{ R}({ d})-{{{ R}}_0}}}{{1-{{{ R}}_0}}}. \)

Risk measures for quantitative-metric data where R(d) simply represents the effect size associated with the toxic compound can be defined accordingly as:

  • Additional effect: \( { R}_{{ Add}}^{*} = { R}({ d})-{{{ R}}_0}. \)

  • Relative effect (size): \( { R}_{{ R}{ el}}^{*} = \frac{{{ R}({ d})-{{{ R}}_0}}}{{{{{ R}}_0}}}. \)

In quantitative risk assessments of environmental contaminants, in particular, when chronic inhalation exposure is assessed in epidemiological studies on cancer incidence or mortality, the unit risk (UR) has been used as an international agreed risk measure, defined as the extra risk when a constant concentration of the toxic compound of 1 μg/m3exists in the inhaled air. Formally, this can be written:

$$ Unit\ risk = P\left( {C|constant\ exposure\ 1\,\upmu \mathrm{ g}/{{\mathrm{ m}}^3}} \right) - P\left( {C|no\ exposure} \right) $$

where C represents the occurrence of the observed disease, e.g., cancer. Similar as for the additional risk, the first term on the right describes the probability of disease due to the exposure (1 μg/m3) and, respectively, the second due to background, i.e., when the substance is absent. UR is then the excess lifetime cancer risk from continuous lifetime exposure to an agent at a concentration of 1 μg/m3 in air. The interpretation of an UR = 3 × 10−6 per μg/L means that three excess cancer cases are expected to develop per 1,000,000 people if exposed to the unit dose (UD), i.e., the daily exposure for a lifetime to 1 μg of the substance in 1 m3 in air, analogously, when exposed to drinking water in units of 1 μg/L water or through food in units of 1 μg/kg food, see, e.g., http://www.epa.gov/risk_assessment/glossary.htm. In a specific situation, the UR is simply multiplied by the exposure dose, say μg/m3, to calculate a risk estimate (see, e.g., Becher and Steindorf 1993). UR is the preferred measure for comparing the carcinogenic potentials of different toxic compounds (see, e.g., Table 2 where some important airborne environmental carcinogens are compared with polycyclic hydrocarbons which show a 1,000 higher carcinogenic potency compared to diesel soot particles). It should be noted that without additional specification, all these risk measures assume lifelong constant exposure to the substance, in the past often assuming life length of 70 years.

Table 2 Estimates for unit risks (UR) and unit doses (LAI 1992)

Dose Extrapolation

Extrapolating from an established dose–response relationship available for the dose range

$$ {{\mathrm{ D}}_{\mathrm{ Experimental}}}: {{\mathrm{ d}}_{\min }} < \mathrm{ d} < {{\mathrm{ d}}_{\max }} $$

to a lower dose range

$$ {{\mathrm{ D}}_{\mathrm{ Extrapolation}}}: {{\mathrm{ d}}_{\mathrm{ L}}} < \mathrm{ d} < {{\mathrm{ d}}_{\mathrm{ U}}},\ \mathrm{ where}\ {{\mathrm{ d}}_{\mathrm{ U}}} < {{\mathrm{ d}}_{\min }} $$

should distinguish between low-dose extrapolation with or without assuming a threshold dose. This distinction has guided risk assessment (WHO 1999), although the question of the existence of biological thresholds has hardly been unequivocally resolved for any compound. Interindividual differences of responses both of carcinogenic as well as noncarcinogenic substances are just one observation which questions the existence of universally applicable thresholds (“heterogeneity in the population” argument) (see also Rhomberg et al. (2011)). Nevertheless, the threshold concept has been introduced in regulatory toxicology as pragmatic mean and has been applied even though lower doses may show a biological effect but considered as irrelevant or may be indistinguishable from background in the presence of statistical variation including measurement error. An overview on possible extrapolation scenarios for human or animal data depending on the assumption on the existence of threshold doses is given in Table 3.

Table 3 Four possible scenarios for extrapolation

Risk Assessment Under the Threshold Dose Assumption

When the existence of a threshold is assumed, below which no biologically relevant effect of the compound can be expected, the aim of a regulatory approach may be to estimate that biological threshold, say D*, as close and precise as possible. Accounting for the uncertainty of that estimate, a sufficiently large safety margin represented by a safety factor (SF) would establish an intervention dose (ID) below which no biologically significant effects would be expected:

$$ \mathrm{ ID} = \mathrm{ D}^* / \mathrm{ SF}, $$

also referred as reference dose (RfD) (see WHO (1999)) defined as the maximum dose without significant or appreciable adverse effect on human health.

In a first step toward estimating D*, traditionally the smallest experimental dose at which no adverse effect is observed has been determined using the dose–response data available. Practically, this is pursued through statistical hypothesis testing of each dose group against the control group, stepwise, starting with the lowest dose until one finds the highest dose at which there is still no statistically significant difference of the effects compared to control (significance usually defined by a p value < 0.05). Consequently, the next higher dose such tested must show a statistically significant effect. The highest dose with no statistically significant effect is then denoted NOAEL (no observed adverse effect level) and serves as estimate of the biological threshold D* and is used PoD/RP, Table 1 step 5a. When no NOAEL can be identified in a dose–response data set (e.g., when all doses tested were statistically significant different from the controls), the smallest dose that caused a statistically significant effect denoted LOAEL (lowest observed adverse effect level) has been suggested to serve as PoD/RP. Since the LOAEL would in general overestimate D*, a higher safety factor (usually by a factor of ten higher) is used. It should be noted that the estimation of the NOAEL may be significantly above or below D* and that the use of the NOAEL has been criticized therefore (EFSA 2009), predominantly for three reasons:

  • Strongly depending on the number of cases tested per dose group. The larger the number of the examined subjects per dose, the higher is the statistical sensitivity (power) of the approach and thus the chance that a statistically significant effect is found at a dose. In converse, the smaller the sample sizes have been chosen per dose group, the higher will be the NOAEL, eventually higher than the highest dose tested.

  • Depending on the sensitivity of the biological assay. The higher the sensitivity of the experimental determination of the biological effect, the smaller will be the NOAEL.

  • Strongly depending on the choice of doses and dose range. The selection of the doses in DExperimental is crucial for the identification and localization of the NOAEL. If doses are widely spread in relation to true range where the dose–response curve increases, the NOAEL can be determined only very vaguely and can be far above or below D*.

Safety factors (SFs) are applied in the second step of the establishment of the PoD/RP, e.g., by dividing the NOAEL by SFs representing different types of uncertainty. Traditionally, two types of SFs have been used (cf. Edler et al. 2002) when extrapolating from animals to humans:

  • SFinterspec = 10 to take into account the interspecies variability between animals and humans. It allows for the possibility that the average exposed person is up to ten-fold more sensitive than the average exposed animal for which the NOAEL was determined (see the case TS in Table 3).

  • SFintraspec = 10 to take into account the interindividual variability. This is to ensure that a ten-fold more sensitive individual than that for which the PoD/RP value was derived will still be protected by the PoD/RP (see the case ES in Table 3).

For a refinement of these SFs accounting for both toxicokinetic and toxicodynamic data, if available, see, e.g., Dorne and Renwick (2005). It should be noted that even then these SFs are default factors not accounting for specific toxicokinetic and toxicodynamic knowledge of the toxic compound. A biologically based extrapolation would transform the dose–response relationship from animals to humans using toxicokinetic information by applying two physiologically based toxicokinetic (PBTK) models, one for the animal strain and another for humans permitting the calculation of concentrations in target organs. A precondition, however, is that sufficient biological information is available to construct both PBTK models.

If based on an animal experiment, dose has been converted from animal experiments to humans using interspecies extrapolation (USEPA 2005; ECHA 2012). For that extrapolation from animals to humans oral exposures, an allometric scaling is used where the administered doses are adjusted with body weights to the power of ¾ based on allometric scaling.

Risk Assessment Without Threshold Dose Assumption

For compounds for which no threshold dose is assumed, there are two approaches (see Fig. 1a). At first, one can try to expand the dose–response curve F(d) to the entire dose range with the inclusion of the “zero dose,” i.e., where only background exposure may exert an effect. The dose interval D: 0 ≤ d ≤ dmax serves then as base of the dose–response assessmen and estimates of the risk, could be made at any exposure level. However, this implies that four to six orders of magnitude both in terms of response or in terms of dose must be bridged by extrapolation. Although mathematical dose–response models are fit for this purpose, the biological database is not and a dose–response relationship F(d) in the experimental range can only provide limited information on the relationship in the extrapolation range DExtrapolation. It was found that different mathematical models equally good fitting the data in DExperimental provided largely deviant risk estimates when extrapolated to the low-dose range of interest, differing by several orders of magnitude. When, e.g., the one-hit model, the multistage model, and the two empirical models derived from the Weibull distribution and the log-normal distribution would all fit the data, one would obtain increasingly differing risk results when going to lower doses, always in the same order of

$$ One \ Hit < Multistage < Weibull < Lognormal, $$

when excess risk is considered (see Krewski and van Ryzin 1981). This strong dependence of the risk estimates on the models selected and lack of biological justification for using a particular model has significantly compromised the use of these models for regulatory purposes.

Fig. 1
figure 1

(a) Dose–response curve F(d) in the observed range dmin < d < dmax and in the extrapolation range 0 < d < dmin. (b) Benchmark dose (BMD) approach restricted to a left truncated dose range combining that DExperimental : dmin, < d < dmax and a limited extrapolation range DExtrapolation of the dose–response curve (the author thanks Annette Kopp-Schneider for providing the figure)

An alternative approach focuses on modeling dose–response for doses from DExperimental : dmin, < d < dmax allowing only a limited extrapolation to DExtrapolation : dL < d < dU, where dU < dmin, and 0 < dL using the data available. Modeling determines the dose associated with a predetermined but identifiable risk. The best investigated approach therefore is the benchmark dose (BMD) approach (EFSA 2009) described below.

The Limit Risk

A limit risk Rlimit is interpreted as lifetime risk or lifetime cancer risk (LCR), the probability that the exposure will cause cancer (incidence type of risk) or death from cancer (mortality type of risk) within average lifetime.

A first version of the limit risk approach stems from the second half of the last century as “virtually safe dose” (VSD) concept in response to difficulties in complying with US Food, Drug, and Cosmetic Act, when in the context of the Delaney Clause, food additives found to induce cancer at any dose level were banned and the VSD was defined as dose associated with one additional tumor per one million (1,000,000) subjects through lifetime exposure in the belief that such a low risk would be acceptable for a population of several millions, corresponding to a lifetime cancer risk (LCR) of 10−6.

It should be noted that in a population of 100 million, people of the order of 500 000 persons will be diagnosed with cancer every year (IARC 2008). An LCR of 10−5 would then result in 13 additional persons with cancer per year in case the whole population is exposed during its whole lifetime assuming an average lifetime of 75 years, whereas an LCR of 10−6 would represent 1.3 additional cancer case per year in a population of 100 million (see SCCS 2012).

In the context of a risk management decision, it should be noted that the WHO and the US EPA as well as the US OSHA recommended an LCR of 10−5 for carcinogenic compounds. ECHA (2012) states that “based on experiences, cancer risk levels of 10−5 and 10−6 could be seen as indicative tolerable risk levels when setting DMELs (derived minimal effect levels) for workers and the general population, respectively.” Higher risks up to 1/1,000 have been accepted in the regulation in the working environment. The measurable risk in a test group of animals is generally not below 1/20, at best 1/50.

The most extensively used model for calculating an LCR has been the so-called linearized multistage (LMS) model (USEPA 1986). Based on the multistage mutation model of Armitage and Doll, this model is in essence a linear approximation of the dose–response curve. In praxis it has provided robust risk assessments and limit values, and it has become the basis of the slope factor (SF) approach used by the USEPA as convenient descriptor of cancer potency (see http://www.epa.gov/iris/carcino.htm). The LMS model is also a member of the set of models recommended for the BMD.

Benchmark Dose

The benchmark dose (BMD) approach is a general method of fitting dose–response models applicable for any dose–response data based on four gross steps:

  • Specification of type of dose–response data

  • Specification of the BMR

  • Selection of candidate dose–response model(s)

  • Identification of acceptable models

The BMD approach aims at determining a PoD/RP on an empirically and objectively verifiable basis and is applicable for all four scenarios described in Table 3. The BMD was introduced into regulatory practice by the US EPA (USEPA 1999) as the lower confidence limit of the dose at which no such response above background occurs that would exceed a previously defined level, the benchmark response (BMR) (Fig. 1b). The benchmark dose (BMD) is the dose level derived from the dose–response data associated with a specific change in the response defined through the benchmark response (BMR) level which has the following properties:

  • The BMD approach uses all available dose–response data from a study and fits a set of mathematical models. It accounts for the statistical variability of the dose–response data by calculating the confidence interval of the BMD ranging from the lower bound (the BMDL) to the upper bound (the BMDU). The lower one-sided confidence bound BMDL (BMDL10 when setting BMR = 10 %) accounts for the statistical uncertainty in the data (with the statistical certainty level of 95 %) and is used as PoD/RP. The BMD approach has been increasingly used and recommended (EFSA 2009).

  • The BMR should be set equal to a low but measurable response level, reflecting an effect that is negligible or non-adverse. Choosing the BMR too low would normally result in an extrapolation outside the range of the observed data and could induce severe model dependence of the BMDL. Such a low BMR could let different models return drastically different BMD and BMDL values, reducing confidence in the modeling, characterized as a situation where “the risk assessment would be driven by the models fitted to the data and not by the data.” A BMR = 10 % of extra risk over background has been set as a default level (EFSA 2009) when analyzing quantal data such as tumor incidence in animal experiments. For continuous data a BMR = 5 % of change related to background was proposed as default.

  • When different models are fitted to the data and when some models fit equally well but result in different BMDs and BMDLs, selecting the BMDL of the best-fitting model is likely to underestimate the uncertainty in the BMD approach, while selecting the model with the lowest BMDL generally results in an overestimate of the risk. A stepwise and decision tree-based procedure has been proposed by Davis et al. (2010) and iterated by USEPA (2012) which differs from the EFSA approach in that it uses an adaptive approach to find the best-fitting model in contrast to the EFSA approach which is based on finding all models which are compatible with the dose–response data, i.e., those with an acceptable fit, once the data have been selected.

  • Recommended models are for quantal data usually:

    • Probit

    • Log-probit

    • Logistic

    • Log-logistic

    • Weibull

    • Multistage including the LMS

    • Quantal-linear

    • Gamma multihit

    and for continuous data, the

    • Exponential family

    • Hill family

    where each family contains a set of hierarchically nested models allowing for the determination of a best-fitting model.

  • The BMD approach should always be accompanied by appropriate reporting not only of the results finally obtained but also of all relevant information that allows other risk assessors to judge and eventually repeat the analysis.

It should be noted that the outcome of a BMD analysis depends on the criteria used to decide on the acceptability and on the significance level of a goodness-of-fit test chosen. The BMDL depends on the study design, in particular, on the sample size, but much less than the NOAEL.

Other PoDs

Depending on the dose–response data available, two other methods concur with the BMD approach in practice:

T25: Defined as the chronic dose which will give tumors at a specific tissue site in 25 % of the animals after correction for spontaneous incidence and within the standard lifetime of the species (Dybing et al. 1997), the T25 has values that are likely to be within the range of the experimental data. An adjusted T25 is obtained as

$$ HT25 = T25/\left( {b{w_{human }}/b{w_{animal }}} \right)^{0.25} $$

and an LCR can be calculated as

$$ LCR = exposure\ dose/\left( {HT25/0.25} \right) $$

The T25 can be – and has been – applied even when besides the control group, only one dose group was available.

TD50: The TD50 value was introduced primarily for ranking of carcinogens in the Carcinogenic Potency Database (CPDB) (see http://potency.berkeley.edu/). It characterizes the dose which, if administered chronically for the standard lifespan of the species, will halve the probability of the remaining tumor free throughout that period; for details see Sawyer et al. (1984). The determination of the TD50 value is complicated by intercurrent deaths due to causes other than tumorigenesis and the non-observability of the time of onset. The TD50 has been used as PoD when the toxic substance was administered chronically for the standard lifespan of the species, but is not recommended for low-dose extrapolation.

Margin of Exposure (MoE)

Risk assessment of compounds that are both genotoxic and carcinogenic presents particular difficulties, since the effects of such compounds are normally regarded as being without a threshold and no safe level can therefore be defined. Therefore, low-dose extrapolation has been found inappropriate for genotoxic carcinogenic compounds, and pragmatic risk management approaches such as the application of the ALARA (As Low as Reasonably Achievable) and the TTC (Threshold of Toxicological Concern), which establishes exposure thresholds for chemicals present in food, dependent on chemical structure, have been applied. However, such approaches cannot inform risk managers on urgency and extent of the risk reduction measures needed.

More recently the margin of exposure (MoE) approach has been applied by both the European Food Safety Authority (EFSA) and the Joint FAO/WHO Expert Committee on Food Additives (JECFA) not to bridge the gap between the PoD/RP and human exposure but to describe the extension of that gap (Barlow et al. 2006). The MoE is numerically defined as the ratio of the point of departure (PoD/RP) of the critical effect to the theoretical, predicted, or estimated exposure level (WHO 2009). Therefore, the BMD approach provides a practical tool when it defines a PoD/RP.

The magnitude of the MoE gives an indication of the level of concern without extrapolation to the substantially lower exposure levels usually encountered in human situations: the larger the MoE, the smaller the potential risk posed by exposure to the compound under consideration. The MoE should, however, not be used for a numerical quantification of risk but must be considered as practical approach for the formulation of advice to risk management; as a consequence, extrapolation using the MoE has not been recommended to derive a risk estimate or a level of actual risk in the exposed population (Barlow et al. 2006). The EFSA Scientific Committee considered that a MoE of 10,000 or more, based on animal cancer bioassay data, would be of low concern (EFSA 2005). A MoE higher than 10,000 based on BMDL10 can, in cases of lifelong exposure, be associated with an LCR lower than 3.5 × 10−5 if based on a male rat experiment and lower than 7 × 10−5 if based on a male mice experiment and using linear extrapolation (ECHA 2012; USEPA 2005).