Keywords

1 Core Messages

  • In vivo assays for skin sensitization are valuable assets, but must not be regarded as 100% accurate.

  • Guinea pig methods have been superseded by the murine local lymph node assay (LLNA).

  • Regulatory classification as a skin sensitizing chemical means that the substance is regarded as significantly active.

  • A substance which is not classified as a skin sensitizer may still possess a limited potential to cause this effect.

  • Potency categories are new to regulatory sensitization, but the LLNA EC3 value provides a useful guide to the relative potency of a skin sensitizer.

2 Introduction

Given the nature of this handbook, it is neither necessary nor appropriate to delve deeply into matters of the history of in vivo predictive assays for skin sensitization or of the fine details of their conduct. For such matters, the reader is referred to the existing literature (Buehler 1965; Magnusson and Kligman 1970; Wahlberg and Boman 1985; Kimber and Basketter 1992; Kligman and Basketter 1995; Basketter and Gerberick 1996; Gerberick et al. 2000; McGarry 2007). Consequently, what is presented is a short overview of each of the three currently used methods, with aspects relating to their interpretation and how the data they produce is translated into comments on a safety data sheet. This requires some knowledge of regulatory toxicology, which is also covered in the text. Finally, guidance is given on how the in vivo data can give information on the relative potency of a sensitizing chemical, since, although often not available on the safety data sheet, it is the key piece of information necessary for risk assessment/management of sensitizers, as well as for the investigative identification of culprits associated with occupational allergic contact dermatitis.

3 Classification of Skin Sensitizers

It is essential to grasp the core principle that regulatory classification of skin sensitizing chemicals is intended to identify those substances that have the intrinsic potential to represent a significant hazard to human health. It does not aim to identify all skin sensitizers, that is, including those that are only very weakly sensitizing. Of course, a weak sensitizer for which there is extensive skin exposure at relatively high concentration, particularly where the skin is inflamed, perhaps by wet work and irritant chemicals, may in fact present a much great potential risk than that presented by a strong sensitizer used at low concentration. That is not a matter addressed by regulatory toxicology (although it may be addressed of course by a good toxicologist!). At the time of writing, the most commonly used regulations applied to the classification of skin sensitizers are embodied in the Globally Harmonized Scheme (GHS) and in the European regulations known as REACH (Registration, Evaluation, Assessment, and restriction of CHemicals) (Commision of the European Communities Regulation 2006; ECHA 2017; United Nations 2009). From these regulations, the key points regarding classification can be distilled, and these are summarized in Table 1. The tests themselves and classification thresholds are discussed in more detail below.

Table 1 Skin sensitization classification

What does classification mean? The reality is that it can mean a number of things: that a regulatory authority has carefully reviewed all the data and determined the substance is sufficiently sensitizing to merit classification and the label that goes with it (see below); or it may mean that the manufacturer has data that shows the substance to be a skin sensitizer; or that someone is taking a precautionary view in the absence of any read evidence.

What is the decision process and consequence? Normally, it is the wording on a safety data sheet, typically of the form: “May cause sensitization by skin contact.” This is used together with the appropriate warning symbol indicated by the Global Harmonization Scheme (Fig. 1):

Fig. 1
figure 1

Overview of regulatory classification (ECHA 2017)

However, there are many occasions where nonstandard wording still appears to be in use, e.g., “sensitizing to the skin.” When nonstandard wording has been used, it is also reasonable to suspect that the globally accepted regulatory criteria might not have been deployed.

The value and limitations of the above are fully discussed in the chapter on manufacturer’s safety data sheets.

4 Predictive In Vivo Assays

The guinea pig assays that remain in regulatory toxicology are fully described in the guideline of the Organization for Economic Cooperation and Development (OECD 1992). Both use the approach of treatment with test substance under occlusion and at an irritant level to try to induce skin sensitization, followed by challenge, again under occlusion, on flank skin at the maximum nonirritant concentration. The purpose of the challenge phase is to assess the extent to which skin sensitization has been induced. Concurrent sham treated controls are also used at challenge because irritant effects, which cannot be distinguished from allergic reactions, must be excluded. An outline of the guinea pig methods is given in Fig. 2. The LLNA differs from these methods in that it assesses the induction of skin sensitization directly and uses the mouse as the test system. Full details of the method are contained in the OECD Guideline 429 (OECD 2010), and a diagram of the method is presented in Fig. 3.

Fig. 2
figure 2

Outline of guinea pig sensitization tests

Fig. 3
figure 3

Outline of the local lymph node assay

5 The Buehler Test (BT)

Chronologically, this is the oldest of the currently accepted assays, having been developed by Ed Buehler in the Procter & Gamble laboratories in the early 1960s (Buehler 1965). The standard method requires an induction phase consisting of a single 6 h occluded patch containing a mildly irritant concentration of the test chemical to be applied to the shoulder region of 20 guinea pigs, once a week for 3 weeks (Buehler 1965). The quality of the occlusion is key to the success of the assay, such that in its original description, the patch and the guinea pig were firmly constrained for the duration of the exposure. Two weeks later, a challenge patch containing the test chemical at its maximum nonirritant concentration is applied to the shaved flank for 6 h. In parallel, ten control guinea pigs are also challenged. If the substance is a skin sensitizer, then skin reactions should be apparent in some or all of the test guinea pigs, while any question of potential irritant responses at challenge are addressed by the control guinea pig challenge. If 15% or more of the guinea pigs are positive, then the substance is regarded as a classifiable skin sensitizer.

A collation of Buehler test data was published some years ago, giving an indication of the sensitivity of the procedure (Basketter and Gerberick 1996). Despite this, the general perception remains that this method is not as sensitive as other assays, notably the maximization test, such that negative Buehler test data is often viewed with concern by regulatory authorities.

6 The Guinea Pig Maximization Test (GPMT)

Partly in response to early concern about the inadequate sensitivity of the Buehler and other, older, guinea pig tests for skin sensitization, Bertil Magnusson and Albert Kligman set about developing a more sensitive procedure; the details of the method and all the workup being published over three decades ago (Magnusson and Kligman 1970). These efforts gave rise to the guinea pig maximization test, often known as the Magnusson and Kligman test. For this protocol, the induction phase consists of a series of six intradermal injections, in 20 test guinea pigs, involving a moderately irritating dose of the test chemical, the vehicle, and Freund’s Complete Adjuvant (FCA) – this last mentioned item providing a major nonspecific stimulus to the immune system and a localized focus of inflammation to boost the response. One week later, a 48 h occluded patch of a moderately irritating concentration of the test substance is applied over the shaved neck injection sites. Two weeks later, the guinea pigs are challenged with the maximum nonirritant concentration of the test chemical by an occlusive patch applied to the shaved flank. In parallel, ten sham treated controls are also challenged in the same manner. Skin reactions are scored at 48–96 h postapplication. If the substance is a skin sensitizer, then skin reactions should be apparent in some or all of the test guinea pigs, while any question of potential irritant responses at challenge are addressed by the control guinea pig challenge. If 30% or more of the guinea pigs are positive, then the substance is regarded as a classifiable skin sensitizer. Note that some regulatory authorities will now accept a GPMT carried out with ten test and five control animals, particularly if the result is positive.

While there is a modest collection of GPMT results in the original publication (Magnusson and Kligman 1970), a database collated from literature evidence presented results for approximately 300 chemicals (Wahlberg and Boman 1985), and this was added to by the publication of a company database of a similar size some 9 years later (Cronin and Basketter 1994). Since that time, the only significant additional data compilation has been undertaken in respect of the LLNA (see below), but this does contain a little additional GPMT data to provide points of comparison between the assays (NICEATM 2010).

7 Rechallenge in Guinea Pig Testing

Uniquely to the guinea pig tests, if there are any difficulties of interpretation that arise at the first challenge at week 5 or 6, then it is possible a week or two later to undertake a rechallenge of the test animals together with a new batch of control guinea pigs. Details of this process have been published (Kligman and Basketter 1995; Frankild et al. 1996). The primary point is that where there is a suspicion that skin irritancy has occurred at the challenge phase, the rechallenge should clarify the situation: a sensitized guinea pig should give a consistent allergic skin response to challenge, whereas irritant skin response will appear randomly in the test group. In the experience of this author, however, the matter often remains one of expert judgment and is used only infrequently. Worked examples with some discussion have been published (Basketter 2008).

8 The Local Lymph Node Assay (LLNA)

The LLNA arose as an attempt to modernize the predictive identification of skin sensitizing chemicals by making greater use of the explosion of immunological knowledge that occurred in the last quarter of the twentieth century. By this time, the mouse was the surrogate mammal of choice for immunological investigation; furthermore, it was known that the induction of skin sensitization involved cellular migration from skin to the draining lymph nodes, where, in the presence of a sensitizer, T lymphocytes would be stimulated into cell division, a process that could readily be measured via the incorporation of radioactive nucleotides. From this knowledge was born the LLNA (Kimber and Basketter 1992). The assay is shown in diagrammatic form in Fig. 3.

In brief, the LLNA is conducted as follows: groups of 4 CBA/Ca mice (7–12 weeks of age) are treated with 25 μL of test material, or with an equal volume of the vehicle alone on the dorsum of both ears. Treatment is performed once daily for three consecutive days. Five days following the initiation of exposure, all mice are injected via the tail vein with 250 μL of phosphate buffered saline (PBS) containing 20 μCi of tritiated thymidine (2 Ci mmol−1). Mice are sacrificed 5 h later and the draining lymph nodes excised and pooled for each experimental group. A single cell suspension of lymph node cells is prepared by mechanical disaggregation. The lymph node cell suspension is washed twice in an excess of PBS and then precipitated with 5% trichloroacetic acid (TCA) at 4 °C for 18 h. Pellets are resuspended in TCA and the incorporation of tritiated thymidine measured by ß-scintillation counting. A substance is regarded as a skin sensitizer if at any test concentration the proliferation in treated lymph nodes is threefold or greater than that in the concurrent vehicle treated controls.

Compared to the guinea pig tests, LLNA results have been published in a more systematic form (Gerberick et al. 2005; NICEATM 2010; Kern et al. 2010). These sources document results for >400 well-defined chemicals. Also, in contrast to both guinea pig methods, the LLNA has actually successfully passed an independent validation process (Gerberick et al. 2000; Dean et al. 2001).

In addition to the standard assay, several variants that avoid the use of the radioactive endpoint have been introduced (OECD 2010a, b).

9 In Vivo Tests: How Accurate Are They?

The only formalized and independent evaluations of the accuracy of predictive tests for skin sensitization have been undertaken with the LLNA (Dean et al. 2001; ICCVAM 2011). Based on the evaluation of >200 substances, the LLNA was judged to be about 85–90% predictive. Importantly, this means that at least one in ten of the results is potentially incorrect. Unfortunately, it is difficult to determine which ones may be false positive or negative (there is probably an equal number of both types). There is an extensive literature on this topic surrounding the LLNA (e.g., Basketter et al. 1998, 2006, 2009a, b; Kreiling et al. 2008), but much less for the guinea pig assays (Kligman and Basketter 1995; Basketter and Kimber 2010). The ICCVAM evaluation of the LLNA also incorporated a limited assessment of the GPMT and BT, which suggested their accuracy was similar to the LLNA. Of course, any commentary on accuracy of in vivo tests implies that there is a gold standard against which to judge the animal data, and such a definitive dataset does not exist, although in recent years there have been efforts in this direction, with more than 200 substances being categorized according to their relative human skin sensitizing potency (Basketter et al. 2014; Api et al. 2017). This latter material also addresses the question of how such a dataset can comprise all human skin sensitizers as well as identifying those that are only of sufficient intrinsic potency to be classified. In general, the ICCVAM review took the latter of these position, as have more recent offerings of (limited) validation datasets (Casati et al. 2009). In vitro alternatives have been designed and validated according to the same principle.

Figure 4 displays the central issues surrounding the subject of regulatory classification of skin sensitization against the background of a continuous biological spectrum of response.

Fig. 4
figure 4

Overview of the relationship between sensitization potency and regulatory classification thresholds

10 Using the In Vivo Data to Assess Skin Sensitization Risk

The assessment of risk in an occupational setting requires the integration of data on the potency of a skin sensitizer, the exposure that occurs (expressed in μg/cm2), and factors impacting the susceptibility of the exposed group. Much has already been published on the use of LLNA data to enable quantitative risk assessment of skin sensitization, but it should be recognized that the primary focus of the work was the assessment of a single sensitizer being incorporated into cosmetics and household products (Felter et al. 2002, 2003; Api et al. 2008; Basketter 2010). However, whatever the end use of the sensitizing chemical, the key piece of information to be derived from the in vivo assays concerns intrinsic potency. This topic has been reviewed by an expert EU committee a few years ago and more recently by the World Health Organization (Basketter et al. 2005; van Loveren et al. 2008). All express the concern that sensitization potency is extremely difficult to derive from the guinea pig tests, but agree with the recently published review paper that the EC3 value in the LLNA , the concentration necessary to produce a threefold stimulation of proliferation, is a useful predictor of the relative human potency of sensitizer (Basketter et al. 1999, 2007). This perspective is based heavily on a comparison of this mouse threshold with data from threshold results derived from human predictive tests. Figure 5 presents an updated comparison graph, which contains >100 data points. The human thresholds derive from historical human repeated insult patch test (HRIPT) data. Examination of the axes shows that thresholds, and therefore potency, can vary by some five orders of magnitude. What is evident also is that there is a reasonably good relationship, but that, just as with basic hazard identification, it is far from perfect. Nevertheless, in the absence of any other information on the potency of a skin sensitizer, the LLNA EC3 value represents a good starting point for risk assessment/management purposes.

Fig. 5
figure 5

Graphical comparison of mouse and human predictive test thresholds

Recently, GHS classification criteria have been updated to take modest account of the relative potency of skin sensitizers. Accordingly, where the LLNA EC3 value is 2% or lower, the substance is regarded as a stronger sensitizer and this is linked to lower limits at which product labeling is required (0.1% rather than 1.0% content of the sensitizer). Where the LLNA is >2%, the substance is classified as a weaker skin sensitizer and the standard classification and labeling criteria apply. New guidance from the European Chemicals Agency also details the use of human data for this purpose, but these criterial have not yet been widely applied (ECHA 2015). Note that whether the chemical is stronger or weaker, as a substance it is labeled identically.

11 Summary

In vivo tests for the predictive identification of chemicals that possess the intrinsic property of skin sensitization have been available for decades and, broadly speaking, are a valuable and accurate tool. The earlier guinea pig tests proved successful for hazard identification, but have been superseded by the murine LLNA which not only detects hazard, but also can assess the relative potency of a skin sensitizer. For risk assessment and risk management purposes, basic hazard identification, the information commonly available on a safety data sheet is of very limited value. What actually matters is the potency of an identified skin sensitizer, and this is best determined via the LLNA EC3 value. However, it is essential to keep in mind that these predictive tools are not perfect, such that expert advice is often necessary to translate the data into meaningful decisions that will optimize the protection of human health by minimizing the risk of the acquisition of occupational allergic contact dermatitis (Basketter et al. 2015).