Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Background

Regardless of the topic in question, variability and uncertainty are aspects of modeling and assessing health risks which need to be taken into account (Mekel and Fehr 2000; US EPA 2011). “Variability” refers to the (statistical) distribution of the studied phenomena, while “uncertainty” refers to those parameters, factors, and models which are lacking or incomplete. This chapter will expand upon the concept of uncertainty and variability, describe methods of probabilistic estimation and sensitivity analysis, and provide an overview of suitable software.

Variability refers to real heterogeneity with respect to space, time, or persons and represents a feature of the system studied. Subdividing sources of variability according to space, time, and population provides a useful means for their understanding. Examples of temporal variability include, for instance, seasonal food consumption patterns or patterns of activities varying on a weekly basis. Both small and wide-area variations are observed in environmental pollution. Examples of intra-individual variability concern behavioral and personal features (Table 1).

Table 1 Sources for variability (based on US-EPA 2011)

In practice, variability can be taken into account through subdividing the studied system into a number of subgroups which are then analyzed separately. In research design and classical statistics, this is called “stratification.” The phenomenon of variability cannot be resolved by additional studies, these can serve solely to characterize the degree of variability more precisely. This results in a need for political-administrative decision-making on the desired level of safety in environmental policy.

Uncertainty, in contrast, is a researcher’s feature. It results from incomplete or lacking knowledge on aspects of the studied system. Uncertainty, just as variability, contributes to variation of analytical results. Types of uncertainty include: scenario, parameter, and model uncertainties. The former concerns, e.g., an exposure pathway which was overlooked. Parameter uncertainty can result from samples lacking representativeness. The third type of uncertainty regards the modeling quality, as, e.g., inclusion or exclusion of a relevant model parameter. In principle, uncertainty can be reduced by doing additional research (Table 2).

Table 2 Sources for uncertainty (Based on US-EPA 2011)

Both phenomena, variability and uncertainty, are relevant to each step of risk assessment. Distinguishing between sources of variability and uncertainty is important regarding two aspects: Firstly, with respect to interpreting the results; when assessing toxicity, for instance, it is important to know which variability exists within the population in question. Additionally, the reliability of this toxicity assessment matters: How sure are we that the toxicity and its variability was estimated correctly?

Secondly, the distinction between variability and uncertainty is important for the following reason: While variability impacts on the assessment’s precision and its generalizability, uncertainty can lead to incorrect statements.

Variability and uncertainty of variables often occur together. If certain aspects of variability are unknown and stratification therefore is not possible, this lack of knowledge contributes to the uncertainty of the analysis. The quantification of soil ingestion from mouthing behavior of small children can serve as an example: It is well known that there are large differences between children concerning the daily soil ingestion. The study design and methods of most recent studies still leave many open questions. For instance, it is questionable to which extent the soil ingestion was determined correctly; what is the variance between children; which type of statistical distribution can best describe the variability, and how do seasonal factors influence these values.

Methods for Quantifying Variability and Uncertainty in Risk Assessment

Point Estimates

In traditional risk assessment, single values or point estimates are commonly being used for representing the input model variables. In order to describe the typical conditions, for model variables having an empirically describable variability, measures of central tendency, i.e., mean or median, are being used. Such an estimate is referred to as “typical case.” For the purpose of considering variability and uncertainty adequately, especially with respect to sufficient health protection, assumptions are mostly conservative or “unfavorable.” So far, upper percentiles like 90th or 95th percentiles of variables or – if such measures were not available – the worst conceivable assumptions were used for exposure assessment. This results in the so-called worst-case approach. Worst-case assumptions are usually a combination of variability and uncertainty concerning model variables. It is problematic that worst-case estimates often do not describe realistic exposure situations.

Probabilistic Estimates

Probabilistic assessments make use of the entire distribution of all or several model variables (Cullen and Frey 1999). Simulated values are randomly chosen from these distributions according to their statistical parameters and then linked to other randomly chosen values according to the model’s algorithms. An example of this principle using the “nutrition” pathway in probabilistic exposure assessment is illustrated in Fig. 1. From the distribution of each of the three input variables, randomly chosen simulation values are being selected, e.g., 1.14 kg/day for food consumption, 4.9 ng/kg for pollutant concentration, and 9.7 kg for body weight. According to the model equation, the resulting exposure is 0.58 μg/kg body weight-day. This procedure is repeated through Monte Carlo simulation many times. The results of these simulations, in turn, can be displayed in a distribution, too. This distribution then represents the exposure assessment’s results and can be described by its statistical parameters such as mean, standard deviation, and percentiles.

Fig. 1
figure 1

Exemplification of a Monte Carlo simulation for the example of dietary exposure of children (IR = food consumption [kg/day], C = concentration in food [μg/kg], BW = body weight [kg])

By using entire distributions for estimation, each possible feature of a variable, including the “tails” of the distribution, is combined with other model variables according to its respective probability. This results in better insights about the populations’ exposure and more meaningful information regarding the spread and confidence interval of the calculated exposure or risk. Additionally, probabilistic methods provide the possibility to include all available information into the assessment, as opposed to an arbitrary selection of percentiles.

Sensitivity Analysis

By conducting sensitivity analysis, model variables that contribute most to the spread of the results can be isolated: If, e.g., the distribution of input variables that are identified as being influential to the final results relies on sound data, the estimation can be considered sound. Body weight, for instance, could have strong influence on the final results. If the probability distribution of body weight applied is based on a representative population sample, the calculated variation can be considered reliable. If, in contrast, the input variables that are identified as influential to the final results rely on a relatively weak data basis, the results, correspondingly, are unreliable. Such findings can also point at further need for research regarding that variable. From this background, variables, which rely on a weak data basis but are identified as not significantly impacting the final results, will not necessarily require an effort to improve the data basis.

The simplest type of sensitivity analyses are What-if-analyses: The size of each input variable is modified (e.g., in steps of 10 %), respectively, while the other variables are kept constant, studying the respective influence on the final result. Itemizing for input values, the most sensitive variables can be identified.

Meaningful sensitivity analysis requires data on variation that usually cannot be obtained from point estimates, but are easily available from probability distributions. Sensitivity analysis is not meaningful when using worst-case point estimates, because the maximum value is used for several input variables (e.g., 100 % resorption). The combination of probabilistic estimates and sensitivity analysis provides information about the reliability of the estimates and its possible consequences regarding risk management.

Application Potential in Dose–Response Assessment

Research and development in the area of probabilistic modeling so far have focused on exposure assessment (Mekel et al. 2007). In recent years only, efforts were made to investigate their application potential in dose–response assessment as an alternative or addition to the application of so-called uncertainty factors that traditionally have been used when transferring data from animal studies to humans. In the Netherlands, these methods are applied in parallel to traditional, deterministic risk assessment of new and existing chemicals and pesticides (Vermeire et al. 2001). Similar developments can be observed in other countries, but often have not become part of regulatory practice yet.

Software for Probabilistic Exposure and Risk Assessment

Faster computers have enabled the application of computationally intensive probabilistic modeling in recent years. Specific commercial software tools for conducting probabilistic simulations are available. These software tools are not specifically designed for use in areas like toxicology or environmental health, but are used in a variety of disciplines where risk and decision analysis is an issue, in particular, in areas like economy and finance.

For performing a probabilistic exposure and risk assessment, the two most popular commercial systems are @Risk (www.palisade.com) and Crystal Ball (www.oracle.com). Both systems work directly as add-ins for spreadsheet software like Excel. @Risk is now available in seven different languages.

Both systems work in similar ways: Both require (i) a user-defined model to be implemented in a spreadsheet, and (ii) the specification of the probability distributions for the model input variables. Differences exist in performance, e.g., in terms of clarity, provision of (partly) automatic functions, graphs, etc. Both systems offer a large amount of different options for performing probabilistic analysis, necessitating, however, considerable intensity of training. Standard statistical packages like SAS or SPSS can be used for probabilistic assessment, too, but all simulation steps need to be programmed. Again, this requires extensive knowledge of the statistical packages.