The first issue in any attempt to conduct a public health risk assessment for chemical exposure problems relates to answering the seemingly straight-forward question: ‘does a chemical hazard exist?’ Thus, all environmental and public health risk management programs designed for chemical exposure situations usually will start with a hazard identification and accounting; this initial process sets out to determine whether or not the substance in question possesses potentially hazardous and/or toxic properties. This chapter discusses the principal activities involved in the acquisition and manipulation of the pertinent chemical hazard information directed at answering this question; ultimately, this would generally help in developing effective environmental and public health risk management decisions/programs about chemical exposure problems.

1 Chemical Hazard Identification: Sources of Chemical Hazards

The chemical hazard identification component of a public health risk assessment involves first establishing the presence of a chemical stressor that could potentially cause adverse human health effects. This process usually includes a review of the major sources of chemical hazards that could potentially contribute to a given chemical exposure and possible risk situation. Indeed, chemical hazards affecting public health risks typically originate from a variety of sources (Box 8.1)—albeit their relative contributions to actual human exposures are not always so obvious. Needless to say, there is a corresponding variability in the range and types of hazards and risks that may be anticipated from different chemical exposure problems.

Oftentimes, qualitative information on potential sources and likely consequences of the chemical hazards is all that is required during this early stage (i.e., the hazard identification phase) of the risk assessment process. To add a greater level of sophistication to the hazard identification process, however, quantitative techniques may be incorporated into this process—to help determine, for instance, the likelihood of an actual exposure situation occurring. The quantitative methods may include a use of mathematical modeling and/or decision analyses techniques to determine chemical fate and behavior attributes following human exposure to a chemical vis-à-vis the likely receptor response upon exposure to the chemical of potential concern. For instance, physicochemical data can be used to predict a chemical’s physical hazard, reactivity, and pharmacokinetics—including attributes such as absorption by different exposure routes, distribution inside the receptor, and likely metabolites associated with the subject chemical. Indeed, physicochemical and structural properties of a chemical of interest/concern are quite critical for chemical characterization processes—especially because they can help in the prediction of a chemical’s potential to pose a physical hazard, its reactivity, and its pharmacokinetic characteristics (such as bioavailability and likely routes of exposure). Ultimately, this initial evaluation for a chemical exposure problem should provide great insight into the nature and types of chemicals, the populations potentially at risk, and possibly some qualitative ideas about the magnitude of the anticipated risk.

Box 8.1 Examples of major sources of chemical hazards potentially resulting in public health problems

  • Consumer products (including foods, drinks, cosmetics, medicines, etc.)

  • Urban air pollution (including automobile exhausts, factory chimney stacks, etc.)

  • Contaminated drinking water

  • Industrial manufacturing and processing facilities

  • Commercial service facilities (such as fuel stations, auto repair shops, dry cleaners, etc.)

  • Landfills, waste tailings and waste piles

  • Contaminated lands

  • Wastewater lagoons

  • Septic systems

  • Hazardous materials stockpiles

  • Hazardous materials storage tanks and containers

  • Pipelines for hazardous materials

  • Spills from loading and unloading of hazardous materials

  • Spillage from hazardous materials transport accidents

  • Pesticide, herbicide, and fertilizer applications

  • Contaminated urban runoff

  • Mining and mine drainage

  • Waste treatment system and incinerator emissions

2 Data Collection and Evaluation Considerations

The process involved in a public health risk assessment for chemical exposure problems will usually include a well-thought out plan for the collection and analysis of a variety of chemical hazard and receptor exposure data. Ideally, and to facilitate this process, project-specific ‘work-plans’ can be designed to specify the administrative and logistic requirements of the general activities to be undertaken – as discussed in Chap. 6, and excerpted below. A typical data collection work-plan that is used to guide the investigation of chemical exposure problems may include, at a minimum, a sampling and analysis plan together with a quality assurance/quality control plan. The general nature and structure for such types of work-plans, as well as further details on the appropriate technical standards for sample collection and sample handling procedures, can be found in the literature elsewhere (e.g., Asante-Duah 1998; ASTM 1997b; Boulding 1994; CCME 1993; CDHS 1990; Keith 1988, 1991; Lave and Upton 1987; Petts et al. 1997; USEPA 1989a, b).

In general, all sampling and analysis should be conducted in a manner that maintains sample integrity and encompasses adequate quality assurance and control. Also, specific samples collected should be representative of the target materials that are the source of, and/or ‘sink’ for, the chemical exposure problem. And, regardless of its intended use, it is noteworthy that samples collected for analysis at a remote location are generally kept on ice prior to and during transport/shipment to a certified laboratory for analysis; also, completed chain-of-custody records should accompany the samples to the laboratory.

Indeed, sampling and analysis can become a very important part of the decision-making process involved in the management of chemical exposure problems. Yet, sampling and analysis could also become one of the most expensive and time-consuming aspects of such public health risk management programs. Even of greater concern is the fact that errors in sample collection, sample handling, or laboratory analysis can invalidate the hazard accounting and exposure characterization efforts, and/or add to the overall project costs. All samples that are intended for use in human exposure and risk characterization programs must therefore be collected, handled, and analyzed properly—in accordance with all applicable/relevant methods and protocols. To ultimately produce data of sound integrity and reliability, it is important to give special attention to several issues pertaining to the sampling objective and approach; sample collection methods; chain-of-custody documentation; sample preservation techniques; sample shipment methods; and sample holding times. Chapter 6 contains a convenient checklist of the issues that should be verified when planning such type of sampling activity.

Overall, highly effective sampling and laboratory procedures are required during the chemical hazard determination process; this is to help minimize uncertainties associated with the data collection and evaluation aspects of the risk assessment. Ultimately, several chemical-specific parameters (such as chemical toxicity or potency, media concentration, ambient levels, frequency of detection, mobility, persistence, bioaccumulative/bioconcentration potential, synergistic or antagonistic effects, potentiation or neutralizing effects, etc.) as well as various receptor information are further used to screen and help select the specific target chemicals that will become the focus of a detailed risk assessment.

2.1 Data Collection and Analysis Strategies

A variety of data collection and analysis protocols exist in the literature (e.g., Boulding 1994; Byrnes 1994; CCME 1993, 1994; Csuros 1994; Garrett 1988; Hadley and Sedman 1990; Keith 1992; Millette and Hays 1994; O’Shay and Hoddinott 1994; Schulin et al. 1993; Thompson 1992; USEPA 1982, 1985, 1992a, b, c, d, e; Wilson 1995) that may be adapted for the investigation of human exposure to chemical constituents found in consumer products and in the human environments. Regardless of the processes involved, however, it is important to recognize the fact that most chemical sampling and analysis procedures offer numerous opportunities for sample contamination and/or cross-contamination from a variety of sources (Keith 1988). To be able to address and account for possible errors arising from ‘foreign’ sources, quality control (QC) samples are typically included in the sampling and analytical schemes. The QC samples are analytical ‘control’ samples that are analyzed in the same manner as the ‘field’ samples—and these are subsequently used in the assessment of any cross-contamination that may have been introduced into a sample along its life cycle from the field (i.e., point of collection) to the laboratory (i.e., place of analysis).

Invariably, QC samples become an essential component of all carefully executed sampling and analysis programs . This is because, firm conclusions cannot be drawn from the investigation unless adequate controls have been included as part of the sampling and analytical protocols (Keith 1988). To prevent or minimize the inclusion of ‘foreign’ constituents in the characterization of chemical exposures and/or in a risk assessment, therefore, the concentrations of the chemicals detected in ‘control’ samples must be compared with concentrations of the same chemicals detected in the ‘field’ samples. In such an appraisal, the QC samples can indeed become a very important reference datum for the overall evaluation of the chemical sampling data.

In general, very well designed sampling and analytical protocols are necessary to facilitate credible data collection and analysis programs. Sampling protocols are written descriptions of the detailed procedures to be followed in collecting, packaging, labeling, preserving, transporting, storing, and tracking samples. The selection of appropriate analytical methods is also an integral part of the processes involved in the development of sampling plans—since this can strongly affect the acceptability of a sampling protocol. For example, the sensitivity of an analytical method could directly influence the amount of a sample needed in order to be able to measure analytes at pre-specified minimum detection (or quantitation) limits. The analytical method may also affect the selection of storage containers and preservation techniques (Keith 1988; Holmes et al. 1993). In any case, the devices that are used to collect, store, preserve, and transport samples must not alter the sample in any manner. In this regard, it is noteworthy that special procedures may be needed to preserve samples during the period between collection and analysis.

Finally, the development and implementation of an overall good quality assurance/quality control (QA/QC) project plan for a sampling and analysis activity is critical to obtaining reliable analytical results. The soundness of the QA/QC program has a particularly direct bearing on the integrity of the sampling as well as the laboratory work. Thus, the general process for developing an adequate QA/QC program, as discussed in Chap. 6 of this book and elsewhere in the literature (e.g., CCME 1994; USEPA 1987, 1992a, b, c, d, e), should be followed religiously. Also, it must be recognized that, the more specific a sampling protocol is, the less chance there will be for errors or erroneous assumptions.

2.2 Reporting of ‘Censored’ Laboratory Data

Oftentimes, in a given set of laboratory samples, certain chemicals will be reliably quantified in some (but not all) of the samples that were collected for analysis. Data sets may therefore contain observations that are below the instrument or method detection limit, or indeed its corresponding quantitation limit; such data are often referred to as ‘censored data’ (or ‘non-detects’ [NDs]). In general, the NDs do not necessarily mean that a chemical is not present at any level (i.e., completely absent)—but simply that any amount of such chemical potentially present was probably below the level that could be detected or reliably quantified using a particular analytical method. In other words, this situation may reflect the fact that either the chemical is truly absent at this location or sampled matrix at the time the sample was collected—or that the chemical is indeed present, but only at a concentration below the quantitation limits of the analytical method that was employed in the sample analysis.

In fact, every laboratory analytical technique has detection and quantitation limits below which only ‘less than’ values may be reported; the reporting of such values provides a degree of quantification for the censored data. In such situations, a decision has to be made as to how to treat such NDs and associated ‘proxy’ concentrations. The appropriate procedure depends on the general pattern of detection for the chemical in the overall investigation activities (Asante-Duah 1998; HRI 1995). In any case, it is customary to assign non-zero values to all sampling data reported as NDs. This is important because, even at or near their detection limits, certain chemical constituents may be of considerable importance or concern in the characterization of a chemical exposure problem. However, uncertainty about the actual values below the detection or quantitation limit can also bias or preclude an effectual execution of subsequent statistical analyses. Indeed censored data do create significant uncertainties in the data analysis required of the chemical exposure characterization process; such data should therefore be handled in an appropriate manner—for instance, as elaborated in the example methods of approach provided below.

2.2.1 Derivation and Use of ‘Proxy’ Concentrations

‘Proxy’ concentrations are usually employed when a chemical is not detected in a specific sampled medium per se. A variety of approaches are offered in the literature for deriving and using proxy values in environmental data analyses, including the following relatively simpler ones (Asante-Duah 1998; HRI 1995; USEPA 1989a, 1992a, b, c, d, e):

  • Set the sample concentration to zero. This assumes that if a chemical was not detected, then it is not present—i.e., the ‘residual concentration’ is zero. This involves or calls for very compelling assumptions, and it can rarely be justified that the chemical is not present in the sampled media. Thus, it represents a least conservative (i.e., least health-protective) option.

  • Drop the sample with the non-detect for the particular chemical from further analysis. This will have the same effect on the data analysis as assigning a concentration that is the average of concentrations found in samples where the chemical was detected.

  • Set the proxy sample concentration to the sample quantitation limit (SQL) . For NDs, setting the sample concentration to a proxy concentration equal to the SQL (which is a quantifiable number used in practice to define the analytical detection limit) makes the fewest assumptions and tends to be conservative, since the SQL represents an upper-bound on the concentration of a ND. This option does indeed offer the most conservative (i.e., most health-protective) approach to chemical hazard accounting and exposure estimation. The approach recognizes that the true distribution of concentrations represented by the NDs is unknown.

  • Set the proxy sample concentration to one-half the SQL. For NDs, setting the sample concentration to a proxy concentration equal to one-half the SQL assumes that, regardless of the distribution of concentrations above the SQL, the distribution of concentrations below the SQL is symmetrical. [It is noteworthy that, when/if the subject data are highly skewed then a use of the SQL divided by the square-root-of-two (i.e., SQL/√2) is recommended, instead of one-half the SQL.]

In general, in a ‘worst-case’ approach, all NDs are assigned the value of the SQL – which is the lowest level at which a chemical may be accurately and reproducibly quantitated; this approach biases the mean upward. On the other hand, assigning a value of zero to all NDs biases the mean downward. The degree to which the results are biased will depend on the relative number of detects and non-detects in the data set, and also the difference between the reporting limit and the measured values above it. Oftentimes, the common practice seems to utilize the sample-specific quantitation limit for the chemical reported as ND. In fact, the goal in adopting such an approach is to avoid underestimating exposures to potentially sensitive or highly exposed groups such as infants and children, but at the same time attempt to approximate actual ‘residual levels’ as closely as possible. Ultimately, recognizing that the assumptions in these methods of approach may, in some cases, either overestimate or underestimate exposures, the use of sensitivity analysis to determine the impact of using different assumptions (e.g., ND = 0 vs. ND = SQL/2 vs. ND = SQL/√2; etc.) is encouraged.

Other methods of approach to the derivation of proxy concentrations may involve the use of ‘distributional’ methods; unlike the simple substitution methods shown above, distributional methods make use of the data above the reporting limit in order to extrapolate below it (USEPA 1992a, b, c, d, e). Indeed, even more robust methods than this may be utilized in such applications for handling censored data sets. In any event, selecting the appropriate method to adopt for any given situation or problem scenario generally requires consideration of the degree of censoring, the goals of the assessment, and the degree of accuracy required.

Finally, it is noteworthy that, notwithstanding the options available from the above procedures of deriving and/or using ‘proxy’ concentrations, re-sampling and further laboratory analysis should always be viewed as the preferred approach to resolving uncertainties that surround ND results obtained from sampled media. Thence, if the initially reported data represent a problem in sample collection or analytical methods rather than a true failure to detect a chemical of potential concern, then the problem could be rectified (e.g., by the use of more sensitive analytical protocols) before critical decisions are made based on the earlier results.

3 Statistical Evaluation of Chemical Sampling/Concentration Data

Once the decision is made to undertake a public health risk assessment, the available chemical exposure data has to be carefully examined/appraised—in order to, among other things, arrive at a list of chemicals of potential concern (CoPCs) ; the CoPCs represent the target chemicals of focus in the risk assessment process. In general, the target chemicals of significant interest or concern to chemical exposure problems may be selected for further detailed evaluation on the basis of several specific and miscellaneous important considerations—such as shown in Box 8.2. The use of such selection criteria should generally compel an analyst to continue with the exposure and risk characterization process only if the chemicals represent potential threats to public health. For such chemicals, general summary statistics would commonly be compiled; meanwhile, it is worth the mention here that, where applicable, data for samples and their duplicates are typically averaged before summary statistics are calculated—such that a sample and its duplicate are ultimately treated as one sample for the purpose of calculating summary statistics (including maximum detection and frequency of detection). Where constituents are not detected in both a sample and its duplicate, the resulting values are the average of the sample-specific quantitation limits (SSQLs) . Where both the sample and the duplicate contain detected constituents, the resulting values are the average of the detected results. Where a constituent in one of the pair is reported as not detected and the constituent is detected in the other, the detected concentration is conservatively used to represent the value of interest. On the whole, the following summary statistics are typically generated as part of the key statistical parameters of interest:

  • Frequency of detection —reported as a ratio between the number of samples reported as detected for a specific constituent and the total number of samples analyzed.

  • Maximum detected concentration —for each constituent/receptor/medium combination, after duplicates have been averaged.

  • Mean detected concentration —typically the arithmetic mean concentration for each constituent/receptor/medium combination, after duplicates have been averaged, based on detected results only.

  • Minimum detected concentration —for each constituent/area/medium combination, after duplicates have been averaged.

Next, the proper exposure point concentration (EPC) for the target populations potentially at risk from the CoPCs would be determined; an EPC is the concentration of the CoPC in the target material or product at the point of contact with the human receptor.

Box 8.2 Typical important considerations in the screening for chemicals of potential concern for public health risk assessments

  • Status as a known human carcinogen versus probable or possible carcinogen

  • Status as a known human developmental and reproductive toxin

  • Degree of mobility, persistence, and bioaccumulation

  • Nature of possible transformation products of the chemical

  • Inherent toxicity/potency of chemical

  • Concentration-toxicity score—reflecting concentration levels in combination with degree of toxicity (For exposure to multiple chemicals, the chemical score is represented by a risk factor, calculated as the product of the chemical concentration and toxicity value; the ratio of the risk factor for each chemical to the total risk factor approximates the relative risk for each chemical—giving a basis for inclusion or exclusion as a CoPC)

  • Frequency of detection in target material or product (Chemicals that are infrequently detected may be artifacts in the data due to sampling, analytical, or other problems, and therefore may not be truly associated with the consumer product or target material under investigation)

  • Status and condition as an essential element—i.e., defined as essential human nutrient, and toxic only at elevated doses (For example, Ca or Na generally does not pose a significant risk to public health, but As or Cr may pose a significantly greater risk to human health )

The EPC determination process typically will consist of an appropriate statistical evaluation of the exposure sampling data—especially when large data sets are involved. Statistical procedures used for the evaluation of the chemical exposure data can indeed significantly affect the conclusions of a given exposure characterization and risk assessment program. Consequently, appropriate statistical methods (e.g., in relation to the choice of proper averaging techniques) should be utilized in the evaluation of chemical sampling data. Meanwhile, it is noteworthy that over the years, extensive technical literature has been put forward regarding the ‘best’ probability distribution to utilize in different scientific applications—and such resources should be consulted for appropriate guidance on the statistical tools of choice.

3.1 Parametric Versus Nonparametric Statistics

There are several statistical techniques available for analyzing data that are not necessarily dependent on the assumption that the data follow any particular statistical distribution. These distribution-free methods are referred to as nonparametric statistical tests—and they have fewer and less stringent assumptions. Conversely, several assumptions have to be met before one can use a parametric test. At any rate, whenever the set of requisite assumptions is met, it is always preferable to use a parametric test—because it tends to be more powerful than the nonparametric test. However, to reduce the number of underlying assumptions required (such as in a hypothesis testing about the presence of specific trends in a data set), nonparametric tests are typically employed.

Nonparametric techniques are generally selected when the sample sizes are small and the statistical assumptions of normality and homogeneity of variance are tenuous. Indeed, nonparametric tests are usually adopted for use in environmental impact assessments because the statistical characteristics of the often messy environmental data make it difficult, or even unwise, to use many of the available parametric methods. It is noteworthy, however, that the nonparametric tests tend to ignore the magnitude of the observations in favor of the relative values or ranks of the data. Consequently, as Hipel (1988) notes, a given nonparametric test with few underlying assumptions that is designed, for instance, to test for the presence of a trend may only provide a ‘yes’ or ‘no’ answer as to whether or not a trend may indeed be present in the data. The output from the nonparametric test may not give an indication of the type or magnitude of the trend. To have a more powerful test about what might be occurring, many assumptions must be made—and as more assumptions are formulated, a nonparametric test begins to look more like a parametric test. It is also noteworthy that, the use of parametric statistics requires additional detailed evaluation steps—with the process of choosing an appropriate statistical distribution being an important initial step.

3.1.1 Choice of Statistical Distribution

Of the many statistical distributions available, the Gaussian (or normal) distribution has been widely utilized to describe environmental data; however, there is considerable support for the use of the lognormal distribution in describing such data. Consequently, chemical concentration data for environmental samples have been described by the lognormal distribution, rather than by a normal distribution (Gilbert 1987; Leidel and Busch 1985; Rappaport and Selvin 1987; Saltzman 1997). Basically, the use of lognormal statistics for the data set X1, X2, X3, Xn requires that the logarithmic transform of these data (i.e., ln[X1], ln[X2], ln[X3], ln[Xn]) can be expected to be normally distributed.

In general, the statistical parameters used to describe the different distributions can differ significantly; for instance, the central tendency for the normal distributions is measured by the arithmetic mean, whereas the central tendency for the lognormal distribution is defined by the geometric mean. In the end, the use of a normal distribution to describe environmental chemical concentration data, rather than lognormal statistics will often result in significant over-estimation, and may be overly conservative—albeit some investigators have argued otherwise (e.g., Parkhurst 1998). In fact, Parkhurst (1998) argues that geometric means are biased low and do not quite represent components of mass balances properly, whereas arithmetic means are unbiased, easier to calculate and understand, scientifically more meaningful for concentration data, and more protective of public health. Even so, this same investigator (Parkhurst 1998) still concedes to the non-universality of this school of thought—and these types of arguments and counter-arguments only go to reinforce the fact that no one particular parameter or distribution may be appropriate for every situation. Consequently, care must be exercised in the choice of statistical methods for the data manipulation exercises carried out during the hazard accounting process—and indeed in regards to other aspects of a risk assessment.

3.1.2 Goodness-of-Fit Testing

Recognizing that the statistical procedures used in the evaluation of chemical exposure data should generally reflect the character of the underlying distribution of the data set, it is preferable that the appropriateness of any distribution assumed or used for a given data set be checked prior to its application. This verification check can be accomplished by using a variety of goodness-of-fit methods.

Goodness-of-fit tests are formal statistical tests of the hypothesis that a specific set of sampled observations is an independent sample from the assumed distribution. The more common general tests include the Chi-square test and the Kolmogorov-Smirnov test; common goodness-of-fit tests specific for normality and log-normality include the Shapiro-Wilks’ test and D’Agostino’s test (see, e.g., D’Agostino and Stephens 1986; Gilbert 1987; Miller and Freund 1985; Sachs 1984). At any rate, it is worth mentioning here that goodness-of-fit tests tend to have notoriously low power—and indeed are generally best for rejecting poor distribution fits, rather than for identifying good fits. In general, if the data cannot be fitted well enough to a theoretical distribution, then perhaps an empirical distribution function or other statistical methods of approach (such as bootstrapping techniques) should be considered.

Another way to determine the specific probability distribution that adequately models the underlying population of a data set is to test the probability of a sample being drawn from a population with a particular probability distribution; one such test is the W-test (Shapiro and Wilk 1965). The W-test is particularly important in assessing whether a sample is from a population with a normal probability distribution; the W-test can also be used to assess if a sample belongs to a population with a lognormal distribution (i.e., after the data has undergone a natural logarithm transformation). It is noteworthy that, the W-test (as developed by Shapiro and Wilk) is limited to a small sample data set size (of 3 to 50 samples). However, a modification of the W-test that allows for its use with larger data sets (up to about 5000 data points) is also available (e.g., in the formulation subsequently developed by Royston) (Royston 1995).

3.2 Statistical Evaluation of ‘Non-detect’ Values

During the analysis of environmental sampling data that contains some NDs, a fraction of the SQL is usually assumed (as a proxy or estimated concentration) for non-detectable levels—instead of assuming a value of zero, or neglecting such values. This procedure is typically used, provided there is at least one detected value from the analytical results, and/or if there is reason to believe that the chemical is possibly present in the sample at a concentration below the SQL. The approach conservatively assumes that some level of the chemical could be present (even though a ND has been recorded) and arbitrarily sets that level at the ‘appropriate’ percentage of the SQL.

In general, the favored approach in the calculation of the applicable statistical values during the evaluation of data containing NDs involves the use of a value of one-half of the SQL. This approach assumes that the samples are equally likely to have any value between the detection limit and zero, and can be described by a normal distribution. However, when the sample values above the ND level are log-normally distributed, it generally may be assumed that the ND values are also log-normally distributed; the best estimate of the ND values for a log-normally distributed data set is the reported SQL divided by the square root of two (i.e., \( \frac{SQL}{\sqrt{2}} \)= \( \frac{SQL}{1.414} \)) (CDHS 1990; USEPA 1989a). Also, in some situations, the SQL value itself may be used if there is strong enough reason to believe that the chemical concentration is closer to this value, rather than to a fraction of the SQL. If it becomes apparent that serious biases could result from the use of any of the preceding methods of approach, more sophisticated analytical and evaluation methods may be warranted.

3.3 Selection of Statistical Averaging Techniques

Reasonable discretion should generally be exercised in the selection of an averaging technique during the statistical analysis of environmental sampling data—viz., chemical concentration data in particular. This is because, among other things, the selection of specific methods of approach to determine the average of a set of environmental sampling data can have profound effects on the resulting concentration—especially for data sets coming from sampling results that are not normally distributed. For example, when dealing with log-normally distributed data, geometric means are often used as a measure of central tendency – in order to ensure that a few very high (or low) values on record do not exert excessive influence on the characterization of the distribution. However, if high concentrations do indeed represent ‘hotspots’ in a spatial or temporal distribution of the data set, then using the geometric mean could inappropriately discount the contribution of these high chemical concentrations present in the environmental samples. This is particularly significant if, for instance, the spatial pattern indicates that areas of high concentration for a chemical release are in close proximity to compliance boundaries or near exposure locations for sensitive populations (such as children and the elderly).

The geometric mean has indeed been extensively and consistently used as an averaging parameter in the past. Its principal advantage is in minimizing the effects of ‘outlier’ values (i.e., a few values that are much higher or lower than the general range of sample values). Its corresponding disadvantage is that, discounting these values may be inappropriate when they represent true variations in concentrations from one part of an impacted area or group to another (such as a ‘hot-spot’ vs. a ‘cold-spot’ vs. a ‘normal-spot’ region). As a measure of central tendency, the geometric mean is most appropriate if sample data are lognormally distributed, and without an obvious spatial pattern.

The arithmetic mean —commonly used when referring to an ‘average’—is more sensitive to a small number of extreme values or a single ‘outlier’ compared to the geometric mean. Its corresponding advantage is that true high concentrations will not be inappropriately discounted. When faced with limited sampling data, however, this may not provide a conservative enough estimate of environmental chemical impacts.

In fact, none of the above measures, in themselves, may be appropriate in the face of limited and variable sampling data. Contemporary applications tend to favor the use of an upper confidence limit (UCL) on the average concentration. Even so, if the computed UCL exceeds the maximum detected value amongst a data pool, then the latter is used as the source term or EPC. Finally, it has to be cautioned that in situations where there is a discernible spatial pattern to chemical concentration data, standard approaches to data aggregation and analysis may usually be inadequate, or even inappropriate.

3.3.1 Illustrative Example Computations Demonstrating the Potential Effects of Variant Statistical Averaging Techniques

To demonstrate the possible effects of the choice of statistical distributions and/or averaging techniques on the analysis of environmental data, consider a case involving the estimation of the mean, standard deviation, and confidence limits from monthly laboratory analysis data for groundwater concentrations obtained from a potential drinking water well. The goal here is to compare the selected statistical parameters based on the assumption that this data is normally distributed versus an alternative assumption that the data is lognormally distributed. To accomplish this task, the several statistical manipulations enumerated below are carried out on the ‘raw’ and log-transformed data for the concentrations of benzene in the groundwater samples shown in Table 8.1.

Table 8.1 Environmental sampling data used to illustrate the effects of statistical averaging techniques on exposure point concentration predictions
  1. (1)

    Statistical Manipulation of the ‘Raw’ Data. Calculate the following statistical parameters for the ‘raw’ data: mean, standard deviation, and 95% confidence limits. [See standard statistics textbooks for details of applicable procedures involved.] The arithmetic mean, standard deviation, and 95% confidence limits (95% CL) for a set of n values are defined, respectively, as follows:

    $$ {X}_m=\frac{\sum_{i=1}^n{X}_i}{n} $$
    (8.1)
    $$ {SD}_x=\sqrt{\frac{\sum_{i=1}^n{\left({X}_i-{X}_m\right)}^2}{n-1}} $$
    (8.2)
    $$ {CL}_x={X}_m\pm \frac{ts}{\sqrt{n}} $$
    (8.3)

    where: Xm = arithmetic mean of ‘raw’ data; SDx = standard deviation for ‘raw’ data; CLx = 95% confidence interval (95% CI) of ‘raw’ data; t is the value of the Student t-distribution [as expounded in standard statistical books] for the desired confidence level (e.g., 95% CL, which is equivalent to a level of significance of α = 5%) and degrees of freedom, (n–1); and s is an estimate of the standard deviation from the mean (Xm). Thus,

    Xm = 0.213 μg/L

    SDx = 0.379 μg/L

    CLx = 0.213 ± 0.241 (i . e., −0.028 ≤ CIx ≤ 0.454) and UCLx = 0.454 μg/L

    where: UCLx = 95% upper confidence level (95% UCL) of ‘raw’ data.

    Note that, the computation of the 95% confidence limits for the untransformed data produces a confidence interval of 0.213 ± 0.109 t = 0.213 ± 0.241 [where t = 2.20, obtained from the Student t-distribution for (n–1) = 12–1 = 11 degrees of freedom] – and which therefore indicates a non-zero probability for a negative concentration value; indeed, such value may very well be considered meaningless in practical terms—consequently revealing some of the shortcomings of this type of computational method of approach.

  2. (2)

    Statistical Manipulation of the Log-transformed Data. Calculate the following statistical parameters for the log-transformed data: mean, standard deviation, and 95% confidence limits. [See standard statistics textbooks for details of applicable procedures involved]. The geometric mean, standard deviation, and 95 percent confidence limits (95% CL) for a set of n values are defined, respectively, as follows:

    $$ {X}_{gm}= anti \log \left\{\frac{\sum_{i=1}^n\ell { n X}_i}{n}\right\} $$
    (8.4)
    $$ {SD}_x=\sqrt{\frac{\sum_{i=1}^n{\left({X}_i-{X}_{gm}\right)}^2}{n-1}} $$
    (8.5)
    $$ {CL}_x={X}_{gm}\pm \frac{ts}{\sqrt{n}} $$
    (8.6)

    where: Xgm = geometric mean for the ‘raw’ data; SDx = standard deviation of ‘raw’ data (assuming lognormal distribution); CLx = 95% confidence interval (95% CI) for the ‘raw’ data (assuming lognormal distribution); t is the value of the Student t-distribution [as expounded in standard statistical books] for the desired confidence level and degrees of freedom, (n–1); and s is an estimate of the standard deviation of the mean (Xgm). Thus,

    Y a − mean  = − 2.445

    SDy = 1.154

    CLy = − 2.445 ± 0.733 (i . e., a confidence interval from − 3.178 to − 1.712)

    where: Y a − mean = arithmetic mean of log-transformed data; SDy = standard deviation of log-transformed data; and CLy = 95% confidence interval (95% CI) of log-transformed data. In this case, computation of the 95% confidence limits for the log-transformed data yields a confidence interval of −2.445 ± 0.333 t = −2.445 ± 0.733 [where t = 2.20, obtained from the student t-distribution for (n-1) = 12–1 = 11 degrees of freedom].

    Now, transforming the average of the logarithmic Y values back into arithmetic values yields a geometric mean value of Xgm = e−2.445 = 0.087. Furthermore, transforming the confidence limits of the log-transformed values back into the arithmetic realm yields a 95% confidence interval of 0.042 μg/L to 0.180 μg/L; recognize that these consist of positive concentration values only. Hence,

    Xgm = 0.087 μg/L

    SDx = 3.171 μg/L

    0.042 ≤ CIx ≤ 0.180 μg/L

    UCLx = 0.180 μg/L

    where: UCLx = 95% upper confidence level (95% UCL) for the ‘raw’ data (assuming lognormal distribution).

In consideration of the above, it is obvious that the arithmetic mean, Xm = 0.213 μg/L, is substantially larger than the geometric mean of Xgm = 0.087 μg/L. This may be attributed to the two relatively higher sample concentration values in the data set (namely, sampling events #4 and #5 in Table 8.1)—which consequently tend to strongly bias the arithmetic mean; on the other hand, the logarithmic transform acts to suppress the extreme values. A similar observation can be made for the 95% upper confidence level (UCL) of the normally- and lognormally-distributed data sets. In any event, irrespective of the type of underlying distribution , the 95% UCL is generally a preferred statistical parameter to use in the evaluation of environmental data, rather than the statistical mean values.

The results from the above example analysis illustrate the potential effects that could result from the choice of one distribution type over another, and also the implications of selecting specific statistical parameters in the evaluation of environmental sampling data. In general, the use of arithmetic or geometric mean values for the estimation of average concentrations would tend to bias the EPC or other related estimates; the 95% UCL characteristically offers a better value to use—albeit may not necessarily be a panacea in all situations .

4 Estimating Chemical Exposure Point Concentrations from Limited Data

In the absence of adequate and/or appropriate field sampling data, a variety of mathematical algorithms and models are often employed to support the determination of chemical exposure concentrations in human exposure media or consumer products. Such forms of chemical exposure models are typically designed to serve a variety of purposes, but most importantly tend to offer the following key benefits (Asante-Duah 1998; Schnoor 1996):

  • To gain better understanding of the fate and behavior of chemicals existing in, or to be introduced into, the human living and work environments.

  • To determine the temporal and spatial distributions of chemical exposure concentrations at potential receptor contact sites and/or locations.

  • To predict future consequences of exposure under various chemical contacting or loading conditions, exposure scenarios, or risk management action alternatives.

  • To perform sensitivity analyses, by varying specific parameters, and then using models to explore the ramifications of such actions (as reflected by changes in the model outputs).

The results from the modeling are generally used to estimate the consequential exposures and risks to potential receptors associated with a given chemical exposure problem.

One of the major benefits associated with the use of mathematical models in public health risk management programs relate to the fact that, environmental concentrations useful for exposure assessment and risk characterization can be estimated for several locations and time-periods of interest. Indeed, since field data are often limited and/or insufficient to facilitate an accurate and complete characterization of chemical exposure problems, models can be particularly useful for studying spatial and temporal variability, together with potential uncertainties. In addition, sensitivity analyses can be conducted by varying specific exposure parameters—and then using models to explore any ramifications reflected by changes in the model outputs.

In the end, the effective use of models in public health risk assessment and risk management programs depends greatly on the selection of the models most suitable for its stated purpose. The type of model selected will characteristically be dependent on the overall goal of the assessment, the complexity of the problem, the type of CoPCs, the nature of impacted and threatened media that are being evaluated in the specific investigation, and the type of corrective actions contemplated. A general guidance for the effective selection of models used in chemical exposure characterization and risk management decisions is provided in the literature elsewhere (e.g., Asante-Duah 1998; CCME 1994; CDHS 1990; Clark 1996; Cowherd et al. 1985; DOE 1987; NRC 1989a, b; Schnoor 1996; USEPA 1987, 1988a, b; Yong et al. 1992; Zirschy and Harris 1986)—with some excerpts presented in Chap. 6 of this title. It is noteworthy that, in several typical environmental assessment situations, a ‘ballpark’ or ‘order-of-magnitude’ (i.e., a rough approximation) estimate of the chemical behavior and fate is usually all that is required for most analyses—and in which case simple analytical models usually will suffice. Some relatively simple example models and equations that are often employed in the estimation of chemical concentrations in air, soil, water, and food products are provided below for illustrative purposes.

  • Screening Level Estimation of Chemical Volatilization into Shower Air. A classic scenario that is often encountered in human health risk assessments relates to the volatilization of contaminants from contaminated water into shower air during a bathing/showering activity. A simple/common model that may be used to derive contaminant concentration in air from measured concentration in domestic water consists of a very simple box model of volatilization. In this case, the air concentration is derived from volatile emission rate by treating the shower as a fixed volume with perfect mixing and no outside air exchange, so that the air concentration increases linearly with time.

    On the whole, the following equation can be used to determine the average air concentration in the bathroom during a shower activity (generally for chemicals with a Henry’s Law constant of ≥ 2 × 10–7 atm-cu m/mol only) (HRI 1995):

    $$ Csha=\frac{\left[\mathrm{Cw}\times f\times \mathrm{Fw}\kern0.5em \times \kern0.5em \mathrm{t}\right]}{2\times \left[\mathrm{V}\times 1000\kern0.5em \upmu \mathrm{g}/\mathrm{mg}\right]} $$
    (8.7)

    where Csha is the average air concentration in the bathroom during a shower activity; Cw is the concentration of contaminant in the tap water (μg/L); ƒ is the fraction of contaminant volatilized (unitless); Fw is the water flow rate in the shower (L/hour); t is the duration of shower activity (hours); and V is the bathroom volume (m3). Similarly, the following equation can be used to determine the average air concentration in the bathroom after a shower activity (generally for chemicals with a Henry’s Law constant of ≥ 2 × 10–7 atm-m3/mol only) (HRI 1995):

    $$ Csha2=\frac{\left[\mathrm{Cw}\times f\times \mathrm{Fw}\kern0.5em \times \kern0.5em \mathrm{t}\right]}{\left[\mathrm{V}\times 1000\kern0.5em \upmu \mathrm{g}/\mathrm{mg}\right]} $$
    (8.8)

    It is noteworthy that, water temperature is a key variable that affects stripping efficiencies and the mass transfer coefficients for the various sources of chemical releases into the shower air.

    In the above simplified representations, the models assume that: there is no air exchange in the shower—which assumption tends to overestimate contaminant concentration in bathroom air; there is perfect mixing within the bathroom (i.e., the contaminant concentration is equally dispersed throughout the volume of the bathroom)—which assumption tends to underestimate contaminant concentration in shower air; the emission rate from water is independent of instantaneous air concentration; and the contaminant concentration in the bathroom air is determined by the amount of contaminants emitted into the box (i.e., [Cw × ƒ × Fw × t]) divided by the volume of the bathroom (V) (HRI 1995).

  • Estimation of Household Air Contamination due to Volatilization from Domestic Water Supply. Contaminated water present inside a home can result in the volatilization of chemicals into residential indoor air—e.g., via shower stalls, bathtubs, washing machines , and dishwashers. Under such scenarios, chemical concentrations in household indoor air due to contaminated domestic water may be estimated for volatile chemicals (generally for chemicals with a Henry’s Law constant of ≥ 2 × 10–7 atm-cu m/mol only), in accordance with the following relationship (HRI 1995):

    $$ Cha\kern0.5em =\kern0.5em \frac{\left[{C}_w\times \mathrm{WFH}\times f\right]}{\left[\mathrm{HV}\times \mathrm{ER}\times \mathrm{MC}\times 1000\kern0.5em \upmu \mathrm{g}/\mathrm{mg}\right]} $$
    (8.9)

    where: Cha is the chemical concentration in air (mg/m3); Cw is the concentration of contaminant in the tap water (μg/L); WFH is the water flow through the house (L/day); ƒ is the fraction of contaminant volatilized (unitless); HV is the house volume (m3/house); ER is the air exchange rate (house/day); and MC is the mixing coefficient (unitless). It is noteworthy that, water temperature is a key variable that affects stripping efficiencies and the mass transfer coefficients for the various sources of chemical releases into the indoor air.

  • Contaminant Bioconcentration in Meat and Dairy Products. In many cases, the tendency of certain chemicals to become concentrated in animal tissues relative to their concentrations in the ambient environment can be attributed to the fact that the chemicals are lipophilic (i.e., they are more soluble in fat than in water). Consequently, these chemicals tend to accumulate in the fatty portion of animal tissue. In general, the bioconcentration of chemicals in meat is dependent primarily on the partitioning of chemical compounds into fat deposits (HRI 1995). Consequently,

    $$ Cx\kern0.5em =\kern0.5em BCF\times F\times Cw $$
    (8.10)

    where: Cx is the chemical concentration in animal tissue or dairy product; BCF is the chemical-specific bioconcentration factor for tissue fat—indicating the tendency of the chemical to accumulate in fat; F is the fat content of the tissue or dairy product; and Cw is the chemical concentration in water fed to the animal (HRI 1995; USEPA 1986a, b, c, d, e, f). Overall, the concentration of such bioaccumulative chemicals in animal tissue (or other animal products for that matter) may be seen as a reflection of the chemical’s inherent bioconcentration capacity—as represented by the BCF.

  • Estimation of Contaminant Concentrations in Fish Tissues/Products. Fish tissue contaminant concentrations may be predicted from water concentrations using chemical-specific BCFs, which predict the accumulation of contaminants in the lipids of the fish. In this case, the average chemical concentration in fish, based on the concentration in water and a BCF is estimated in accordance with the following relationship (HRI 1995):

    $$ Cf\kern0.5em =\kern0.5em Cw\times BCF\times 1000 $$
    (8.11)

    where Cf is the concentration in fish (μg/kg), Cw is the concentration in water (mg/L), and BCF is the bioconcentration factor. In situations where fish tissue concentrations are predicted from sediment concentrations, a two-step process is used; first, sediment concentration is used to calculate water concentrations, and then the water concentrations are used to predict fish tissue concentrations—with the former being carried out in accordance with the following equation:

    $$ Cw = \frac{\mathrm{Csediment}}{\left[{K}_{\mathrm{oc}}\times \mathrm{OC}\times \mathrm{DN}\right]} $$
    (8.12)

    where: Cw is the concentration of the chemical in water ; Csediment is the concentration of the chemical in sediment; Koc is the chemical-specific organic carbon partition coefficient; OC is the organic carbon content of the sediment; DN is the sediment density (relative to water density).

Models can indeed be used for several purposes in the study of chemical exposure and risk characterization problems. In general, the models usually simulate the response of a simplified version of a more complex system. As such, the modeling results are imperfect. Nonetheless, when used in a technically responsible manner, models can provide a very useful basis for making technically sound decisions about a chemical exposure problem. In point of fact, models are particularly useful where several alternative scenarios are to be compared. In such comparative analyses/cases, all the alternatives are contrasted on a similar basis; thus, whereas the numerical results of any single alternative may not be exact, the comparative results of showing that one alternative is superior to others will usually be valid.

5 Determination of the Level of a Chemical Hazard

In order to make an accurate determination of the level of hazard potentially posed by a chemical, it is very important that the appropriate set of exposure data is collected during the hazard identification and accounting processes. It is also imperative to use appropriate data evaluation tools in the processes involved; several of the available statistical methods and procedures finding widespread use in chemical exposure and risk characterization programs can be found in subject matter books on statistics (e.g., Berthouex and Brown 1994; Cressie 1994; Freund and Walpole 1987; Gibbons 1994; Gilbert 1987; Hipel 1988; Miller and Freund 1985; Ott 1995; Sachs 1984; Sharp 1979; Wonnacott and Wonnacott 1972; Zirschy and Harris 1986). In the final analysis, the process/approach used to estimate a potential receptor’s EPC will comprise of the following key elements:

  • Determining the distribution of the chemical exposure/sampling data, and fitting the appropriate distribution to the data set (e.g., normal, lognormal, etc.);

  • Developing the basic statistics for the exposure/sampling data—to include calculation of the relevant statistical parameters, such as the upper 95% confidence limit (UCL95); and

  • Calculating the EPC —usually defined as the minimum of either the UCL or the maximum exposure/sampling data value, and conceptually represented as follows: EPC = min [UCL95 or Max-Value].

Ultimately, the so-derived EPC (that may indeed be significantly different from any field-measured chemical concentrations) represents the ‘true’ or reasonable exposure level at the potential receptor location of interest—and this value is used in the calculation of the chemical intake/dose for the populations potentially at risk.