Keywords

1 Introduction

Traditionally, in toxicology, animal models are used, which require the extrapolation from observations in animals to humans. On the other hand, animal models provide the complexity of a whole organism which is not offered by cell or tissue cultures. Effects on the skin and eye are, however, local and may not involve complex interactions of different organs in the body. Therefore, human-derived in vitro models using cell or tissue cultures could avoid interspecies difference and yet be sufficient to address local effects. Even for local effects, a single in vitro model may be limited to one or few potential effects or biological events leading to adverse effects on human skin; this can be addressed by usage of testing batteries of several in vitro models: Regulatory use of in vitro models usually involves the combination of several models to assess the hazard potential of test substance. This requires a battery of testing methods and a data interpretation procedure (DIP) to combine the outcomes of these tests. A pre-defined set of testing methods and the fitting DIP is called defined approach.

The present chapter provides an overview of regulatorily accepted test methods based on human-derived in vitro models and defined approaches. The validation of testing methods is a prerequisite of their regulatory acceptance. Adopted test methods are available at the OECD’s webpage.Footnote 1 OECD-adopted test methods are usually taken up in the Annex to the EU Test Methods Regulation (Regulation (EC) No 440/2008 n.d.). Formal adoption of new methods or changes in existing methods by the OECD test guidelines and subsequently in the Annex to the EU Test Methods Regulation may take some time. However, “the latest version of an adopted test guideline should always be used when generating new data, independently of whether it is published by EU or OECD” (ECHA Endpoint Specific Guidance document. ECHA 2017a). In this chapter, we are referring to OECD test guideline methods.

2 Validation and Regulatory Acceptance

Since the early 2000s, several regulations include non-animal test methods (e.g., the European Chemicals Regulation REACH (Registration, Evaluation, Authorization and Restriction of Chemicals); (Regulation (EC) No 1907/2006 n.d.)) or even completely rely on in vitro (and other non-animal) testing methods, such as the EU Cosmetics Regulation (Regulation (EC) No 1223/2009 n.d.).

The validation of an in vitro method and its adoption in an OECD test guideline do not inevitably lead to its global regulatory acceptance and use, much less the complete replacement of the respective in vivo study. The regulatory acceptance of (in vitro) test data depends on regional authorities’ regulations and may also be sector specific (e.g., different for pharmaceuticals, chemicals, cosmetics, and agrochemical formulations). The differences in regional regulatory needs to address skin sensitization have been exemplified in Daniel et al. (2018), and hurdles in regulatory acceptance of in vitro skin irritation and sensitization methods and use have been described by Sauer et al. (2016) and Eskes (2019).

The so-called mutual acceptance of data (MAD) avoids unnecessary repetition of tests for individual countries. Instead, all OECD member countries accept a study, which was performed according to an OECD test guideline and under good laboratory practice (GLP). Until writing of this chapter (January 2020), the MAD applies to individual test methods only, as there are no adopted guidelines for DA yet (see Sect. 3.4.4 Defined Approaches: Combination of in vitro methods to assess skin sensitization). The lack of mutual acceptance for defined approaches hampers the full regulatory acceptance of data obtained with human-derived in vitro models and hence the replacement of in vivo studies (Sauer et al. 2016); the OECD is currently working on validating and implementing DAs into its test guidelines.

The principles of the modular approach of validation have been described and evolved in several publications (Hartung et al. 2004, OECD guidance document no. 34 (OECD 2005), Zuang et al. 2015). A central part of method validation is the assessment of the method’s reliability (i.e., to determine the test’s intra- and interlaboratory variability and transferability) and its relevance (i.e., analyzing the test’s predictive capacity as well as understanding its applicability domain) (Fig. 1).

Fig. 1
figure 1

Method development: From an experimental method to a test guideline via standardization, validation, and regulatory adoption. F/RAND fair, reasonable, and nondiscriminatory (ICH 2010; FDA 2011), GIVIMP good in vitro method practice (OECD 2018d), SOP standard operating procedure

The intra- (i.e., within a certain lab) and interlaboratory (i.e., between different labs) reproducibility is typically determined biostatistically using the data generated in ring trial studies with at least three participating labs which are blinded for test substances’ identities. The assessment of the predictive capacity of a testing method or defined approach requires the testing of substances with well-known reference data (see also subchapter “Reference Data and Validation Sets”). The predictivity of the novel methods is assessed by comparing the results obtained with this method to the reference data. The predicitivity is described by the true positive rate (sensitivity), true negative rate (specificity), and the overall accuracy which are calculated according to Table 1 (Cooper et al. 1979).

Table 1 Calculation of a test method’s predictive performance (confusion matrix)

It shall be mentioned that this comparison of new in vitro models with traditional in vivo models is questionable: The identification of hazard properties of a test substance and the classification and labelling criteria were defined according to the animal models, e.g., and it is grueling to try to accurately reproduce results defined by the parameters of an in vivo method by in vitro models. Moreover, the reproducibility of in vivo test results is limited, even though the test methods are highly standardized: The reproducibility of the refinement in vivo study, the murine local lymph node assay (OECD test guideline 429, OECD 2010), was found to be 89% (based on 296 test substances) and 73% for seriously eye-damaging findings in the Draize rabbit eye irritation (based on 46 test substances) (Luechtefeld et al. 2016a, b). As early as 1971, Weil and Scala reported on the intra- and interlaboratory variability of rabbit eye and skin irritation tests in 25 different laboratories and concluded that “The all-or-none, irritant or nonirritant, eye or skin rating of the reference samples was determined quite differently in different laboratories” (Weil and Scala 1971). In other words, it is a forlorn task to exactly reproduce the results of imperfect in vivo animal methods by in vitro models. Instead, we should strive for human relevance and address disturbance of relevant physiological processes in humans.

3 Regulatory-Accepted Human-Derived In Vitro Models

3.1 Skin Irritation and Corrosion

3.1.1 Testing Methods: Reconstructed Human Epidermis (RhE) Used in OECD Test Guidelines No. 431 and 439

The first regulatory-accepted non-animal method using a human-derived model is the in vitro skin corrosion test utilizing reconstructed human epidermis model (RhE). Typically, the RhE are generated from non-transformed human epidermal keratinocytes forming a multilayered, highly differentiated model of the human epidermis. They consist of organized basal, spinous, and granular layers and a multilayered stratum corneum containing intercellular lamellar lipid layers arranged in patterns analogous to that found in vivo resulting in similar biochemical and physiological properties to human epidermis.

The skin corrosion test assay was first adopted by OECD in 2004 as OECD test guideline no. 431 (OECD 2019a). The corresponding skin irritation test was first adopted in 2010 (and revised several times since then) as OECD test guideline no. 439 (OECD 2019b). Skin irritation and corrosion tests using RhE are based on the experience that skin irritant and corrosive substances induce localized trauma as the underlying mechanism of skin irritation in vivo. The RhE-based tests are designed to predict a skin corrosion or irritation potential of a test substance after exposure on a RhE. Testing according to both OECD test guidelines can be conducted with several commercially available tissues (with similar but distinct exposure protocols and prediction models for each of the different models and irritation and corrosion endpoints).

After application of the test material to the stratum corneum surface of the reconstructed tissue, the induced cytotoxicity is measured by a colorimetric assay. Cytotoxicity is expressed as the reduction of mitochondrial dehydrogenase activity measured by the amount of reduced tetrazolium dye. After isopropanol extraction of the formazan from the tissues, the optical density of the extract is determined spectrophotometrically and compared to negative control valuesFootnote 2 to express relative tissue viability. Test substances reducing viability below certain cutoffs are then identified as skin corrosive or irritant according to the prediction models described in OECD test guidelines no. 431 and 439, respectively.

The prediction models of the skin corrosion test according to OECD test guideline no. 431 have been initially developed and adopted to identify substances not corrosive to the skin and those corrosive to the skin. In the EU, evidence of toxicological effects (at the time of writing this chapter, January 2020, mostly results of animal studies) trigger classification (and labelling) of substances (Regulation (EC) No 1272/2008 n.d.). The classification criteria were agreed at the UN level, the so-called Globally Harmonized System of Classification and Labelling of Chemicals, GHS (United Nations 2007). When toxicological data on a substance meet the classification criteria, the hazards of the substance are identified by assigning a certain hazard category; i.e., a substance is classified in skin corrosion category 1 if “Destruction of skin tissue, namely, visible necrosis through the epidermis and into the dermis, in at least one tested animal after exposure ≤ 4 h” is observed in rabbits tested according to OECD test guideline no. 405 (of note, the classification criteria are defined based on results of the animal studies). This can further be subclassified into subcategories 1A, 1B, and 1C.Footnote 3 This subcategorization of skin corrosion was initially not addressed by the OECD test guideline no. 431, but corrosives were distinguished from non-corrosives, only. Since the year 2015, the OECD test guideline no. 431 now supports the subcategorization into skin corrosives 1A and a combined 1B/C. Of note, an overprediction rate of approximately 30% for substances identified as UN GHS subcategory 1A actually belonging to subcategories 1B or C has been reported (OECD 2019a). In case subcategorization of the corrosive classes is needed and in particular in cases where UN GHS subcategories 1B and 1C have to be differentiated, the biomembrane-based Corrositex assay (OECD test guideline no. 435, OECD 2015a) can be conducted (as this assay is not using a human-derived model, it is not discussed further here).

OECD test guideline no. 439 provides a prediction model to identify substances nonirritant to the skin. In case the test is positive, additional testing is required to provide information whether a substance should be classified as skin irritant (UN GHS category 2) or skin corrosive (UN GHS category 1).

3.1.2 Combination of Methods to Assess Skin Irritation and Skin Corrosion

As can be concluded from the predicted UN GHS categories from OECD test guidelines nos. 431 and 439 and summarized in Fig. 2, in many cases a combination of assays (OECD guidance document no. 203, OECD 2014) is needed to cover the full irritation scale that was covered by the in vivo skin irritation test (OECD test guideline no. 404, OECD 2015b).

Fig. 2
figure 2

Combination of in vitro methods to assess skin irritation/corrosion. Depending on the expected effects (severely irritating or nonirritating), the first test conducted is chosen. Cat category, OECD OECD test guideline, SCT skin corrosion test, SIT skin irritation test

3.2 Phototoxicity

Substances applied to the skin may form active substances by sunlight irradiation causing phototoxic (irritating) effects. Standardized and internationally harmonized in vitro methods and a tiered testing strategy are available, to test for these effects (Kolle et al. 2018). The 3T3 Neutral Red Uptake (NRU) phototoxicity test method (OECD test guideline no. 432, OECD 2019i) is using a mouse fibroblast line (BALB/3T3). Human-derived methods using RhE were developed and successfully pre-validated for phototoxicity assessment (Liebsch et al. 1999) and have been added to the OECD work plan in 2019. Both models were found to be overpredictive (Jirova et al. 2007), and today testing is usually performed according to a tiered testing strategy including light absorption, photoreactivity (formation of reactive oxygen species, ROS; OECD test guideline no. 495, OECD 2019j) of the test substance, as well as its distribution to the human skin (ICH 2013).

3.3 Eye Irritation

3.3.1 Testing Methods: Reconstructed Human Cornea-Like Epithelium Models (RhE) Used in OECD Test Guideline No. 492

The eye irritation test (EIT) based on reconstructed human cornea-like epithelium models (RhCE) was first adopted as OECD test guideline no. 492 in 2015 (OECD 2019c). The RhCE tissue models are three-dimensional, non-keratinized tissue constructs composed of normal human-derived epidermal keratinocytes used to model the human corneal epithelium. RhCE have similar biochemical and physiological properties to human cornea epithelium.

After application of the test material to the surface of the RhCE, the induced cytotoxicity (= loss of viability, specifically of mitochondrial dehydrogenase activity) is measured by a colorimetric assay. Test substances that do not reduce viability below certain cutoffs are then identified as nonirritant to the eye according to the prediction model described in OECD test guideline no. 492. OECD test guideline no. 492 can be conducted with several commercially available tissues (with similar but distinct exposure protocols and prediction models for each of the different models and irritation and corrosion endpoints; OECD 2019c).

3.3.2 Testing Methods: Immortalized Corneal Epithelial Cells Used in OECD Test Guideline No. 494

In the Vitrigel-EIT, immortalized corneal epithelial cells are fabricated in a collagen Vitrigel membrane chamber. In this assay the time-dependent change in transepithelial electrical resistance is used to monitor the disruption of the barrier function. The Vitrigel-EIT assay has been adopted as OECD test guideline no. 494 in 2019 for the identification of ocular nonirritants and seriously eye-damaging substances (UN GHS category 1) (OECD test guideline no. 494, OECD 2019d).

3.3.3 Defined Approaches: Combination of Methods to Assess Eye Irritation and Serious Eye Damage

In 2010 the concept of top-down and bottom-up approaches has been described for eye irritation (Scott et al. 2010) for the replacement of the in vivo eye irritation test (OECD test guideline no. 405, OECD 2017a). Like with skin irritation and corrosion testing, the first test to be conducted is selected based on the expected ocular irritant potential (Fig. 3, OECD guidance document no. 263; OECD 2019e). Both human-derived eye irritation test methods presented above could be employed to identify ocular nonirritants, while at least an additional method is needed to identify UN GHS Cat 1 seriously eye-damaging substances. Meanwhile several in vitro methods have been adopted to identify seriously eye-damaging substances (UN GHS category 1) by the OECD: the bovine corneal opacity and permeability test using bovine corneas (OECD test guideline no. 437 (OECD 2017b)), the isolated chicken eye test using chicken eyes (OECD test guideline no. 438 (OECD 2018c)), the fluorescein leakage test method using Madin-Darby canine kidney cells (OECD test guideline no. 460 (OECD 2017c)), the short-term exposure test method using Statens Serum Institut rabbit cornea cells (OECD test guideline no. 491 (OECD 2018e)), or the Ocular Irritection test method using a complex macromolecular matrix (OECD test guideline no. 496 (OECD 2019k)). As none of these assays is using a human-derived model, they are not discussed further here. Two defined approaches based on in vitro bottom-up approaches combined with physiochemical properties for ocular toxicity have been added to the OECD work plan in 2019.

Fig. 3
figure 3

Combination of in vitro methods to assess eye irritation/serious eye damage. Depending on the expected effects (seriously eye damaging or nonirritating), the first test conducted is chosen. Cat, category; NI, nonirritant; id., identify. Please note that a default GHS Cat 2 identification by exclusion of “nonirritant” and exclusion of “serious eye damage” may not be accepted by all regulatory bodies and additional information such as in vivo data may be required (OECD 2019e)

3.4 Skin Sensitization

The underlying mechanism of skin sensitization is quite well understood and has been broken down into an adverse outcome pathway (OECD 2012a, b). Three of the key events can be assessed experimentally using non-animal methods (OECD 2018a, b, 2019f). Chemical reactivity has been shown to be well associated with allergenic potency and has been described as the molecular initiating event in the adverse outcome pathway. As a second key event of the skin sensitization adverse outcome pathway, keratinocytes must be activated to induce essential (“danger”) signalling molecules. The third key event is the activation of the skin dendritic cells as antigen-presenting cells must upregulate cell surface markers to interact with T cells.

3.4.1 Testing Methods: Synthetic Peptides Used in OECD Test Guideline No. 442C

In the direct peptide reactivity assay (DPRA), the reactivity of a test substance towards synthetic cysteine- and lysine-containing peptides is addressed. For this purpose, a single test substance concentration is incubated with synthetic peptides for ca. 24 h at ca. 25 °C, and the remaining non-depleted peptide concentrations are determined by high-performance liquid chromatography (HPLC) with gradient elution and UV detection at 220 nm.

The peptide depletion of test substance incubated samples is compared to the peptide depletion of the negative control samples and expressed as relative peptide depletion. The DPRA has been first adopted as OECD test guideline no. 442C in 2015 (OECD 2019f).

The DPRA is not using a human-derived cell- or tissue model nor a biomacromolecule as test system, but rather a synthetic heptapeptide. The assay is, however, complementing human-derived models (described below, Sects. 3.4.2 and 3.4.3) in testing batteries to predict a skin sensitization potential in humans.

As the DPRA is not using biological systems, but rather a synthetic heptapeptide, it is often termed an in chemico rather than “in vitro” assay. Information on the reactivity of a test substance towards a peptide (as a proxy for skin proteins) can also be obtained by in silico methods. Several commercial and not-for-profit models have been evaluated (Teubner et al. 2013; Urbisch et al. 2016b; Fitzpatrick et al. 2018) and provided a lower overall predictivity, but good concordance with experimental results with specific models within their applicability domain. So far, peptide reactivity is used to predict a sensitization potential (presence or absence of hazard). Recently, the DPRA was extended to also predict potency classes (kDPRA). In the kDPRA several test substance concentrations are assessed after several incubation times to determine reaction rate constants which are then used to identify strong sensitizers (UN GHS category 1A) (Wareing et al. 2017).

3.4.2 Testing Methods: Human-Derived Keratinocytes Used in OECD Test Guideline No. 442D

As a second key event in the adverse outcome pathway for skin sensitization, keratinocyte activation can be assessed by the KeratinoSens and LuSens assays using the genetically modified human keratinocyte cell lines. Both assays employ the reporter gene for luciferase under the control of an antioxidant response element and hence monitor Nrf-2 transcription factor activity. The endpoint measurement is the upregulation of the luciferase activity after incubation with test substances. This upregulation is an indicator for the activation of the Keap1/Nrf2/ARE signalling pathway. The ARE-Nrf2 luciferase test methods have been first adopted in 2015 as OECD test guideline no. 442D (OECD 2018a).

3.4.3 Testing Methods: Human-Derived Dendritic-Like Cells Used in OECD Test Guideline No. 442E

Dendritic cell activation, the third key event in the adverse outcome pathway for skin sensitization, is addressed by the test methods described in OECD test guideline no. 442E first adopted in 2016 (OECD 2018b). The assays evaluate the potential to activate dendritic cells either by measuring changes in the cell surface marker expression (human cell line activation test (h-CLAT) and the U937 Cell Line Activation Test (U-SENS)) or by means of inducing the cytokine IL-8 in the interleukin-8 reporter gene assay (IL8LUC). The h-CLAT is performed using the human monocytic leukemia cell line THP-1 as surrogate for dendritic cells. As readout, the change in the expression of the cell membrane markers CD 54 and CD 86 is determined by flow cytometry after test substance exposure. Similarly, in the U-SENS the change in the expression of the cell membrane marker CD 86 measured by flow cytometry after test substance exposure of U937 cells is determined. In the IL8LUC a THP-1 derived IL-8-reporter cell line, IL-8 dependent luciferase activity is determined after test substance exposure (OECD 2018b).

3.4.4 Defined Approaches: Combination of In Vitro Methods to Assess Skin Sensitization

Although non-animal methods addressing individual key events of the skin sensitization adverse outcome pathway are available as OECD-adopted test methods, none of the available methods should be considered as a stand-alone method to address the endpoint of skin sensitization, but rather the methods have to be combined in defined approaches. To conclude on the sensitizing potential of a test substance, the data from several methods are combined to a defined approach in which a fixed data interpretation procedure serves as prediction model for the combination of results. Several defined approaches have been described (Table 2, OECD 2016a, b), and in the following, we briefly describe one of the less complex defined approaches for the identification of the skin sensitization hazard (Fig. 4, Bauch et al. 2012; Urbisch et al. 2015).

Table 2 Case studies of defined approaches described in OECD GD 256 (OECD 2016a, b)
Fig. 4
figure 4

The 2 out of 3 defined approach (according to Urbisch et al. 2015). KE, key event in the adverse outcome pathway for skin sensitization with key events (KE) A, KE B, and KE C being key events 1, 2, and 3 in an arbitrary sequence. Any two concordant results determine the overall prediction with at least 2 out the 3 KE addressed. KE1 is addressed by the DPRA (OECD test guideline no. 442C), KE2 by the LuSens/KeratinoSens™ assays (OECD test guideline no. 442D), and KE3 by the h-CLAT (OECD test guideline no. 442E)

In the 2 out of 3 approach (Case study 1 in Table 2), assays addressing three of the key events of the skin sensitization adverse outcome pathway are conducted, and two concordant results determine the overall hazard prediction (i.e., if a test substance is positive in any two of the three assays, it is predicted to be a sensitizer).

This defined approach (and indeed most defined approaches) is combining the data of several test methods adopted by OECD. Yet, the adoption of defined approaches into OECD test guideline is still pending at the time of writing of this chapter (January 2020) and this is prohibiting defined approaches from providing the same regulatory recognition and mutual acceptance of data as the animal tests. Since the “2 out of 3” approach has been first submitted for regulatory acceptance to the European Centre for the Validation of Alternative Methods in 2011, a lot of progress has been made. A project to draft a guideline has been added to the OECD work plan in 2017, and a second draft guideline and supporting documents became available in September 2019 (OECD 2019g, h). The work undertaken to draft these documents included an extensive review of the human and mouse skin sensitization reference data (see also Reference Data and Validation Sets for a discussion of reference data in more general) as well as a discussion about the applicability domains (extending on the applicability domains of the individual assays).

3.5 Genotoxicity

There is a variety of in vitro genotoxicity and mutagenicity models available and used within testing batteries (Kirkland et al. 2006; ICH 2011; SCCS 2014). Interestingly none of the regulatorily accepted models is using human-derived models except for the so-called HuLy assay which is utilizing primary human lymphocytes (OECD test guideline no. 487, OECD 2016c). Recently, methods to detect genotoxic and mutagenic effects in human-derived reconstructed epidermal models were developed and validated in ring trials to detect genotoxicity (Reisinger et al. 2018) and chromosomal aberrations (Curren et al. 2006; Aardema et al. 2010). These methods await the finalization of their validation processes and inclusion in OECD test guidelines to be used within genotoxicity and mutagenicity in vitro testing batteries with the in vitro genotoxicity test for dermal exposure using 3D models added to the OECD work plan.

3.6 Dermal Penetration and Absorption

Dermal penetration and absorption methods are not testing for adverse effects on human skin but are assessing the penetration of dermally applied substance into the skin and through the skin to become systemically available in the human body. The OECD-adopted in vitro method (OECD test guideline no. 428; OECD 2004; Fabian et al. 2017) is utilizing human skin preparations. Non-viable skin can also be used provided that the integrity of the skin can be demonstrated. Either epidermal membranes or split-thickness skin (typically 200–400 μm thick) prepared with a dermatome are acceptable. The principal diffusion barrier of the skin is the non-viable stratum corneum; active transport of chemicals through the skin has not been identified, and dermal metabolism (Bätz et al. 2013; Oesch et al. 2018) is not rate limiting in terms of actual absorbed dose (OECD 2004). Methods to utilize human reconstructed epidermal or full-thickness skin models have been developed and pre-validated (Schäfer-Korting et al. 2008; Ackermann et al. 2010) but are not yet regulatorily accepted.

4 Limitations

4.1 Technical Limitations

Specific test substances may not be applicable to certain test systems. Some examples of these technical limitations are listed below (using the example of the in vitro skin sensitization methods).

In the DPRA (OECD test guideline no. 442C), the depletion of the synthetic heptapeptide is quantified by its UV light absorption after HPLC elution. Test substances co-eluting at the same retention times as the model peptides may hamper the peptide quantification.

The cell-based assay methods (OECD test guidelines no. 442D and 442E) use luciferase-generated bioluminescence or fluorescence of fluorochrome-labelled antibodies as detection methods. Test substances quenching the fluorescence or otherwise interfering with the optical detection may hamper the quantification of the luciferase induction or the identification of labelled cells.

Dendritic cell activation (OECD test guideline no. 442E) is analyzed by flow cytometric determination of the cell surface marker expression. Insoluble particles and polymers may however limit the technical applicability by clogging the flow cytometer.

4.2 Predictive Limitations

4.2.1 Mechanistic Limitations

In the following we present (and partially discuss) known mechanism limitation of in vitro assays. This is not to be understood as a comprehensive list; we’d like to present a few examples based on the in vitro skin sensitization assays.

OECD Test Guideline No. 442C

The DPRA is based on reactivity of a test substance with cysteine and lysine residues. Metals do not form covalent bonds with those two amino acid residues and hence are out of the applicability domain of the assay. Also, test substance reacting with amino acid residues different from cysteine and lysine will not be detected in the DPRA. It has been described that some test substances favor the dimerization or oxidation of the peptide leading to an overestimation of a true peptide depletion (or non-covalent, specific binding, e.g., Roediger and Weninger 2011).

OECD Test Guideline No. 442D

The underlying mechanism for the antioxidant response element pathway addressed in both the KeratinoSens and LuSens assays is closely linked to cysteine reactivity. Therefore, test substance primarily reacting with other amino acid residues (such as acylating agents reacting with lysine) would be expected to be underpredicted in the KeratinoSens and LuSens assays.

4.2.1.1 Metabolic Capacity

The three in vitro skin sensitization tests described above do not contain any external source of metabolic capacity. Nevertheless, the test systems can detect most pre- and pro-haptens. In vitro investigations (Urbisch et al. 2016a; Patlewicz et al. 2016) using test substances requiring molecular transformation to attain a sensitizing potential have shown that pre-haptens can readily be detected in the DPRA, many of which involve autoxidation processes. Moreover, many pro-haptens are also activated by nonenzymatic oxidation (and therefore are pre- and pro-haptens). The cellular models h-CLAT and LuSens have been shown to detect pro-haptens more efficiently; respective enzyme activities were detected in the cell lines (Fabian et al. 2013; Oesch et al. 2018). Thus, it can be concluded that potentially relevant molecular transformations are generally sufficiently considered using the in vitro skin sensitization tests DPRA, LuSens or KeratinoSens, and h-CLAT.

4.2.1.2 Water Solubility and Lipophilicity

OECD test guideline no. 442E describes that the h-CLAT result may be underpredictive (false negative) for test substances with log KOW > 3.5. In this case a “negative” result is interpreted as “inconclusive.” However, a “positive” result will be accepted. While with increasing log Kow, less solubility is expected, it should however not be neglected that OECD test guideline no. 442E (OECD 2018b) and OECD test guideline no. 442D (OECD 2018a) allow the testing of homogenous but non-completely dissolved test substances.

4.2.2 Agrochemical Formulation in In Vitro Skin and Eye Irritation Tests

(Non-animal) tests are typically validated against well-characterized individual reference chemicals. Different chemistries, use classes, and, e.g., special types of mixtures can, however, not always (comprehensively) be included in validation exercises. Therefore, important post-validation experience is gained during routine testing (frequently after test guideline adoption and regulatory acceptance). We present here two examples of test methods both based on reconstructed human tissues. Agrochemical formulations are a special type of mixture (which as such fall under the GHS mixture definition and could generally be considered within the applicability domain of the OECD test guidelines) containing a variety of ingredients to alter the properties of the active ingredient. Oftentimes thereby toxicity is also affected, and rules of additivity do not simply apply. In 2015 Kolle et al., based on a comparative dataset of 97 agrochemical formulations, have reported excellent sensitivity of the EpiOcular eye irritation test assay to predict agrochemical formulations nonirritant to the eye (until the writing of this chapter (January 2020), there is still no non-animal method to reliably predict seriously eye-damaging agrochemical formulations) (Kolle et al. 2015, 2017a). This would lead to the notion that maybe reconstructed human tissues work well for the lower end of the irritation scale also for skin. This could unfortunately not be confirmed for a comparative dataset of 65 agrochemical formulations which showed that the in vitro skin irritation test was neither sufficiently sensitive nor specific (Kolle et al. 2017b). Also proof of concept with five formulations assessed using a protocol modification of the SIT (using a 15 min exposure followed by a 24 or 42 h post-exposure instead of a 60 min exposure followed by a 42 h post-exposure in the EpiDerm SIT according to OECD test guideline no. 439) was not successful (unpublished data). Therefore, most unfortunately, the in vivo assay (OECD test guideline no. 404) is still needed to evaluate the skin irritation potential of agrochemical formulations.

At the time of writing of this chapter (January 2020), there is no regulatory-accepted method available to reliably predict the skin irritation potential of agrochemical formulations; the development and validation of such methods should be fostered.

4.3 Uncertainty

4.3.1 Reference Data and Validation Sets

When evaluating or implementing novel (non-animal) methods, it is of upmost importance to use substance with critically reviewed reference data. There have been reports on cases where method implementation based on so-called proficiency chemicals provided in OECD test guidelines has been hampered (Kolle et al. 2019a, b). Another example is the evaluation of the defined approach for skin sensitization which started with 128 substances for which human as well as local lymph node assay data was available. During the review of the guideline, the reference data was extensively reviewed and resulted in a reduced curated dataset of 105 substances with local lymph node assay data and 76 substances with human reference data (OECD 2019h).

4.3.2 Borderline Range: Uncertainty Arising from Technical and Biological Variance

The borderline range depicts the variance of the individual test methods, including technical and biological variability (Leontaridou et al. 2017). It addresses the uncertainty of the three assays around their respective classification thresholds and represents a range in which the likelihood to obtain a positive or negative result just below or above the classification threshold is equal (Fig. 5).

Fig. 5
figure 5

The borderline range of new and reference methods

The borderline range can be determined statistically (e.g., using pooled standard deviations), using historical intra-laboratory data (Leontaridou et al. 2017). It is useful especially for assays for which no individual statistical analysis is possible due to low number of replicates per treatment (e.g., h-CLAT and DPRA). This evaluation is an amendment to the evaluation given in the respective OECD test guidelines, and it also influences a method’s precision (Leontaridou et al. 2019) (Table 3).

Table 3 Pooled standard deviations and borderline ranges of in vitro methods to predict skin sensitization in humans (Leontaridou et al. 2017)

The definition of a borderline range allows the possible prediction as “ambiguous” underlining the fact that a result close to a classification cutoff is rather random.

Table 3 summarizes the borderline range for the in vitro skin sensitization test methods. Borderline ranges rather than discrete cutoff values should be used in prediction models (or data interpretation procedures, DIP), and potential outcomes of studies which dichotomize continuous data into classes are really “positive,” “negative,” and “inconclusive.”