Introduction

Genotoxicity is considered as one of the most important toxicological endpoints and testing for it is a prerequisite in the hazard assessment of every chemical entity; including industrial chemicals, food additives, pharmaceuticals, cosmetics, veterinary products, etc. (Corvi et al. 2013). Current regulatory strategies to investigate the potential genotoxicity of a compound comprise, in a first tier, of a battery of in vitro tests, i.e., a bacterial mutagenicity test (also referred to as the Ames test) and an in vitro micronucleus test (MNvit) (COM 2011; Kirkland et al. 2011; Corvi et al. 2013; SCCS 2016). The battery approach allows covering the three important mechanisms of genotoxicity, namely, mutagenicity, clastogenicity, and aneugenicity, the latter two being structural and numerical changes at the chromosome level, respectively. In most cases, a positive result in one of the in vitro tests triggers in vivo follow-up, to investigate the corresponding endpoint of the positive in vitro result (thus mutagenicity, clastogenicity and/or aneugenicity) (Corvi et al. 2013). In the context of the European cosmetic regulation 1223/2009, however, in vivo testing is no longer allowed (EC 2009). This poses an enormous problem as the in vitro battery is known to produce a high number of false positives, implying a positive result in the in vitro but not in the in vivo follow-up genotoxicity tests (Kirkland et al. 2005; Ates et al. 2014). Consequently, many potentially safe cosmetic compounds will be banned and chemicals used in other sectors will need to undergo in vivo testing needlessly.

To save time and resources and to avoid the use of animals, several strategies are being considered to optimize the genotoxicity hazard assessment process. For instance, several changes have been made to the protocols of the existing Organisation for Economic Co-operation and Development (OECD) validated in vitro genotoxicity tests to reduce the number of false positives. Furthermore, implementing computational (i.e., in silico) tools is also considered in a first step (Teixeira do Amaral et al. 2014; Ates et al. 2016). However, to date, most of these in silico tools are built on Ames test results, as such predicting the induction of gene mutations without providing information on possible clastogenic or aneugenic effects of the compounds (Bakhtyari et al. 2013; Teixeira do Amaral et al. 2014; Ates et al. 2016).

A promising approach involves the implementation of gene expression or transcriptome analysis into an integrated testing strategy, allowing a judgment based on mechanistic data that can be used in a weight-of-evidence strategy. Several research groups have suggested such so-called gene signatures to discriminate between genotoxic and non-genotoxic chemicals. These can be based on both in vivo and in vitro exposure of the animals/cell systems to the model compounds. For instance, Lee et al. (2013) could identify genotoxic hepatocarcinogens in vivo in rat with a list of 170 differentially expressed genes. Similarly, Suenaga et al. (2013) were able to discriminate genotoxic hepatocarcinogens from non-genotoxic hepatocarcinogens and non-hepatocarcinogens in rat liver based on 16 or 10 genes, depending on the compound exposure duration, 4 or 48 h, respectively, but their training set comprised only 2 genotoxic compounds. Also using 2 genotoxic compounds, Watanabe et al. identified 51 candidate genes from mice liver to discriminate between genotoxic and non-genotoxic compounds (Watanabe et al. 2009). More recently, a similar scenario was used by Li et al. (2015), who developed a classifier based on 14 genotoxic and 14 non-genotoxic compounds using the human TK6 cell line. With a 65-gene list, they were able to reach an accuracy of 100% (based on 3 test compounds). Although promising, these approaches have a number of disadvantages: some of the classifiers are built via in vivo experiments and thus involve animals. These scenarios are also based on a limited set of test compounds, lacking to cover different mechanisms of genotoxicity. As far as the in vitro classifiers are concerned, the cell types used are often not relevant for the human situation (Mathijs et al. 2010; Rieswijk et al. 2016) and/or not entirely metabolically competent. They require an additional external metabolic system (Boehme et al. 2011; Buick et al. 2015; Li et al. 2015), which would also imply a second-test protocol. In addition, the practical implementation of transcriptomics has not yet found its way into routine testing despite its recommendation by different expert groups (Corvi et al. 2013; Zeiger et al. 2015; SCCS 2016). The reluctance might be related to the fact that not every lab is equipped with a microarray/RNAsequencing platform and the data analysis and interpretation remain challenging.

Therefore, in this paper, to the best of our knowledge, it is for the first time that a microarray-derived gene list, based on the metabolically competent human HepaRG™ cell line, has been translated into a qPCR array, suitable for routine genotoxicity testing. Hereto, an existing classifier previously developed by Doktorova et al. (2013) was enriched with data from additional genotoxic and non-genotoxic compounds. Indeed, this original classifier, intended to predict genotoxic carcinogenicity, was based on data from a limited number of genotoxic carcinogens with comparable modes of action, i.e., induction of bulky adducts and alkylation of DNA. This classifier was later extended (Doktorova et al. 2014b), but was still intended to discriminate between genotoxic carcinogens, non-genotoxic carcinogens, and non-carcinogens. To specifically address genotoxicity, we have excluded some of the carcinogenic gene expression data from the Doktorova et al. (2014b) results, to avoid bias towards bulky adduct formation and DNA alkylation and generated a more balanced and genotoxicity-specific classifier. For the generation of our new classifier, genotoxic reference compounds displaying various modes of action were selected including a cross linker, a radical generator causing DNA strand breaks, a DNA repair activator, a tubulin polymerization inhibitor, and a base analogue. In a first phase, the classification capacity of the genotoxicity classifier was tested with a set of 4 genotoxic and 4 non-genotoxic compounds. In a second phase, the new classifier was used to develop a qPCR array. The classification capacity of the developed qPCR array was tested using the same 4 genotoxic and 4 non-genotoxic test compounds and an additional set of 4 test compounds: 2 with a clear genotoxic or non-genotoxic profile and 2 compounds with debatable genotoxicity outcomes.

As for every newly developed assay, to gain more confidence, more compounds need to be tested. However, this study describes the development of a novel assay and a proof-of-concept on how this assay might perform. The authors do not claim validation as this is beyond the scope of the manuscript. The general goal is to offer a possible solution to the debate that already exists for years in the field of genotoxicity of how to overcome false positive results. This could be done by introducing, in a weight-of-evidence approach, of mechanistic qPCR data. As such we hope to spark more interest of genetic toxicologists in applying this technology or considering a similar approach, which might eventually lead to increased confidence in and practical application of gene expression profiling in genotoxicity as more compounds get tested.

Materials and methods

Selection of reference and test compounds

Reference and test compounds were selected based on their genotoxic or non-genotoxic profile as found in multiple peer reviewed reference works or expert opinions such as the publically available opinions of the Scientific Committee on Consumer Safety (SCCS). The SCCS is charged with the safety assessment of certain cosmetic ingredients, namely those intended for the Annexes of the European Cosmetic Regulation 1223/2009. For these cosmetic ingredients (e.g., colorants, preservatives, and UV filters), some concern exists for human health. The SCCS’ published opinions contain valuable toxicological information and are accessible via: https://ec.europa.eu/health/scientific_committees/consumer_safety/opinions_en.

To be designated a genotoxin, the compound must have been proven positive in at least one test of the classical in vitro genotoxicity test battery [e.g., Ames test, mammalian gene mutation test, in vitro chromosome aberration test (CAvit), and in vitro micronucleus test] and in at least 1 in vivo genotoxicity test (e.g., mammalian erythrocyte micronucleus test and mammalian bone marrow chromosome aberration test). To test whether the HepaRG cell line can detect genotoxic metabolites, we added several pro-genotoxins, i.e., compounds that require metabolization to exert their genotoxic properties.

For the non-genotoxic group, the applicability domain of the compounds was added to represent a broader chemical space. A non-genotoxin is considered a compound that shows to be negative in the classical in vitro genotoxicity test battery. In addition, compounds with a false positive in vitro profile were added to the classifier, as advised by Kirkland et al. (2016). Finally, to validate the performance and classification capacity of the qPCR array, 5 in vivo genotoxins and 5 in vivo non-genotoxins were selected. Hereto, the same selection criteria as for the classifier were applied. In addition, 2 compounds for which so far equivocal genotoxicity data are available were subjected to the qPCR array. As such, test compounds can be subdivided into three groups:

“Clear” in vivo genotoxins: compounds with undisputable evidence regarding their in vivo genotoxic profile in mammals: chloramphenicol (CHF), 2,4-diaminotoluene (DAT), ethyl methanesulphonate (EMS), 1-ethyl-1-nitrosourea (ENU), and etoposide (ETO).

“Clear” in vivo non-genotoxins: compounds with a clear in vivo negative outcome in mammals: anthranilic acid (ANT), basic orange (BOR), climbazole (CLI,) 4-chlororesorcinol (CLR), and melatonin (MELA).

“Doubtful” compounds with debatable genotoxicity results p-choloraniline (pCA) and m-aminophenol (MAP): Kirkland et al. (2016) grouped pCA as genotoxic. An assessment of the World Health Organization (WHO) led to the conclusion that even though pCA is possibly genotoxic, results are sometimes conflicting, and in the WHO assessment, no conclusion on the in vivo genotoxicity of pCA was made, in spite of a positive in vivo micronucleus (MNviv) test result. By choosing the latter as test compound, mechanistic-based predictions can support the positivity or negativity of the compound. MAP shows negative results in the MNviv, but is positive in the Ames test and the CAvit and MNvit (Boehncke et al. 2003; SCCS 2006). In addition, this compound is an isomer of a direct metabolite of pCA, namely p-aminophenol (Boehncke et al. 2003).

Tables 1, 2 provide an overview of the reference compounds that were used to generate the original classifier of Doktorova et al. (2013), the reference compounds added to generate the new, enriched, classifier, and the test compounds to test the new classifier (test compounds phase 1) and the qPCR array (test compounds phase 1 and 2). Pro-genotoxins are further specified in Table 1.

Table 1 List of tested in vivo genotoxic compounds
Table 2 List of tested in vivo non-genotoxic compounds

Cell exposure and cytotoxicity assessment

Cell exposure and cytotoxicity assessment were conducted as previously described (Doktorova et al. 2013, 2014a, 2014b). Briefly, cryopreserved differentiated HepaRG™ cells were purchased from Biopredic International and cultivated according to the manufacturer’s protocol. To determine the low cytotoxic concentration IC10 (reducing cell viability by 10%), after 7 days of cultivation, the cells were exposed for 72 h, with repeated exposure every 24 h as described (Doktorova et al. 2013), to the selected compounds and the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) test was carried out (n = 3). The evaluation of the dose-range responses and the respective IC10 values were calculated by using Masterplex® software. IC10 concentrations are mentioned in Tables 1 and 2.

Transcriptomics sample preparation

To generate the gene expression signatures, the HepaRG™ cells were exposed to the selected compounds for a period of 72 h, with repeated exposure every 24 h, at IC10 concentrations, as described (Doktorova et al. 2013). Samples for RNA isolation were prepared by removing the cell culture medium and collecting the cells in lysis RLT buffer supplemented with β-mercapthoethanol (QIAshredder Kit; Qiagen; Product number: 79654). Total RNA extraction (RNeasy Mini Kit; Qiagen; Product number: 74106), including a DNase digestion step, was done according to the manufacturer’s instructions. Quality control [Agilent 2001 Bioanalyzer (RNA integrity number (RIN) > 7)] and microarray chip hybridization using Affymetrix U133 Plus 2.0 GeneChip were performed according to the standard procedures. All microarray chips were further subjected to quality control including hybridization and overall signal quality assessment, presence of artifacts and DNA degradation. The quality of the microarray raw data was also checked using Expression Console (Affymetrix®). Only when quality control showed acceptance and no visual artifacts or signs of DNA degradation were detected, data were further analyzed. All experiments were performed in triplicates.

Microarray data analysis and gene list generation

The raw microarray data were re-annotated via Ensembl ID, followed by normalization using the Robust Multiarray Average method. This resulted in 20,111 probe sets for further analysis. The probe sets were subjected to 3 different statistical approaches to end up with reliable gene ranking lists. Following statistical approaches were applied: (1) limma analysis; (2) leave-one-out Welch t test; and (3) fold change and false discovery rate (FDR) pre-selection. The analyses were performed using R, version 3.1.2.

  1. 1.

    As an input for the limma analysis, the treatment versus control log2 ratios per compound were used. The compounds were separated into two groups: genotoxic and non-genotoxic, and the genes were ranked in order of evidence for differential expression. The limma analysis was performed using the available library in R, version 3.1.2 (Phipson et al. 2016). The FDR, to correct for multiple comparisons, was set at 0.001.

  2. 2.

    The leave-one-out Welch t test, which aimed to eliminate compound specific effects, was performed between the genotoxic and non-genotoxic group by repeatedly leaving out one compound. The p values were adjusted for multiple testing and the FDR was set at 0.05. The overlap of the resulting 24 gene lists was taken for the final gene selection.

  3. 3.

    An additional pre-selection of genes was done based on fold changes of treatment versus control deregulation in combination with FDR (i.e. Twofold up- or down-regulation in comparison to the respective control and FDR < 0.01). Compounds belonging to the same toxic class (either genotoxic or non-genotoxic) were compared within the group considering the direction of deregulation (up- or down-regulation). Only genes present in the gene lists of at least 3 compounds within the same toxic class were kept.

Finally, the results of the 3 methods were compared and only genes selected by each of the 3 statistical approaches were considered trustable and as such were used in the next analysis. To identify the top genotoxin-specific genes that can discriminate between the genotoxic and non-genotoxic compounds, the overlapping model was subjected to “prediction analysis for microarrays” (PAM). This is a tool that uses the nearest shrunken centroid method and takes the average gene expression for each gene in each class divided by the within-class standard deviation for that gene. The main advantage is a reduction of the “noise” genes (Tibshirani et al. 2002). The ultimate goal was to develop a singleplex qPCR array in a 96-well plate design. Hereto, the 84 genes that resulted in the best compound prediction were selected for further analyses.

Classification analysis and pathway analysis

To check the classification accuracy of the generated list of 84 genes, the support vector machine (SVM) algorithm was applied. The selected test compounds can be found in Tables 1 and 2 (test compounds phase 1). The classifiers were trained with the leave-one-out method. For further classification of the test compounds, the tuned model (e.g. kernel= “linear”, gamma = 0.00001, cost = 10) was used. Furthermore, Pearson/Ward hierarchical cluster analysis (HCA) and principle component analysis (PCA) were performed based on the log2 fold changes. For the pathway analysis, the Panther classification system was applied (Mi et al. 2017).

qPCR array development: primer design and optimization

In the first step of the qPCR array development, hydrolysis (TaqMan) primers, with 6-carboxyfluorescein (FAM) as fluorophore and tetramethylrhodamine (TAMRA) as quencher, were designed and optimized. The design was carried out using the PrimerQuest® tool (Integrated DNA Technologies). Each qPCR array consists of 5 housekeeping genes (HKG), a no template control (NTC) with H2O as input sample, a no amplification control (NAC) with RNA as input sample, a positive control, a negative control, and the 84 test genes. Initial focus was the selection of appropriate HKG. Ten well-known and described HKG for the HepaRG cell line (Ceelen et al. 2011) were selected and tested. Of these the top 5 were further used for the qPCR array. The selection was done using the GeNorm software (Vandesompele et al. 2002). All primers were checked for secondary formations (e.g., hairpins) and similarity of the sequences to the gene of interest. Furthermore, the efficiency of each primer pair was tested by individual qPCRs. Only primer pairs resulting in efficiencies between 90 and 110% were selected for further analysis. In addition, primer pairs resulting in Cq values below 28 and R2 ≥ 0.98 were preferred. For both the HKG and the genes of interest, cDNA samples of HepaRG cells exposed to compounds that were also used to build the classifier (data not shown) have been used (n = 3).

qPCR

RNA concentrations and quality of the RNA samples were determined using Nanodrop 2000C (Thermo Scientific). Per sample, 10 µg (total volume of 200 µl) cDNA was synthesized using the iScript cDNA Synthesis Kit (BioRad). On the qPCR plate, 2 µl (0.05 µg/µl) purified cDNA (GenElute PCR Clean-Up Kit, Sigma) was used in a total reaction mix of 20 µl per well (master mix: TaqMan® Gene Expression Master Mix, Applied Biosystems). The qPCR plates were run according to the following protocol: 0.20 min at 95 °C; 0.01 min at 95 °C; 0.20 min at 60 °C (40 cycles). SVM classification analysis was performed as described above on the log2 fold changes, calculated using the 2(−ΔΔCq) method. Normalization of the mRNA expression was done against the geometric means of the mRNA expression levels of the 5 reference genes.

Testing of the qPCR array

The optimized hydrolysis primers were spotted in the wells of a 96-well qPCR plate (done by Integrated DNA Technologies). The performance of the resulting qPCR array was tested as described above by a total of 12 test compounds (test compounds of phases 1 and 2, Tables 1, 2). All experiments were performed in triplicate. Material and methods for cell exposure, cytotoxicity assessment, and RNA extraction are identical to the above-mentioned sections.

Results

Generating a gene classifier to discriminate between genotoxins and non-genotoxins

To generate a gene classifier that is able to discriminate between genotoxic and non-genotoxic compounds, metabolic-competent human HepaRG cells were exposed to sub-cytotoxic concentrations (IC10, as measured by the MTT test) of 12 genotoxic and 12 non-genotoxic reference compounds. Microarrays were performed (n = 3 per reference compound) and the data were then subjected to three different statistical methods (i.e., limma analysis, leave-one-out Welch t-test and fold change, and FDR rate pre-selection). The combination of these three strategies for gene pre-selection resulted in a first gene list of 322 overlapping genes, that are differentially expressed between cells exposed to genotoxic and non-genotoxic test compounds, as identified by all 3 statistical methods. To evaluate the overall performance of this set of genes in distinguishing between genotoxic and non-genotoxic compounds, HCA and PCA were performed. Both approaches showed that most compounds were grouped in the correct class (Fig. 1a, c). However, as the final goal of the project is to create a singleplex qPCR array (i.e. 96-well format), the 322-gene list was subjected to PAM gene ranking, to reduce the total number of genes. This analysis resulted in the selection of the top 84 genes that show the most robust rates of correct classification of the reference compounds into the genotoxic or non-genotoxic group. The HCA and PCA of the reduced gene list can be found in Fig. 1b, d. The grouping of the genotoxic and non-genotoxic compounds showed similar patterns for the 84 genes to what was observed with the larger gene set of 322 genes which implies that the precision of correct identification of genotoxic compounds in the minimized set of 84 genes is maintained.

Fig. 1
figure 1

PCA and HCA of the 332 (a, c) and the 84 (b, d) gene lists respectively. In the PC plots, dots show how genotoxic (GTX) compounds are grouped. Triangles represent non-genotoxic (NGTX) compounds. AFB aflatoxin B1, AMP ampicillin trihydrate, BAP benzo[α]pyrene, BLE bleomycin, CAD cadmium chloride, CAP caprolactam, CIS cisplatin, CND clonidine, CPM cyclophosphamide, DMN dimethylnitrosamine, DMO d-mannitol, HBM hydroxybenzomorpholine, MMS methyl methanesulphonate, NAP 1-naphthol, NCL sodium chloride, NF 2-nitrofluorene, NFE nifedipine, NNK 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone, SDF sodium diclofenac, SOR sorbitol, TBT tolbutamide, TRI triclozan, VIN vinblastine sulfate, ZID zidovudine

To identify the biological relevance of these 84 genes for chemical-induced genotoxicity, a pathway analysis was performed using the Panther classification system. The results showed that the “genotoxic fingerprint” consisting of 84 genes covers a broad range of pathways, molecular functions, biological processes, and protein classes (Figs. 2, 3, 4, 5). In brief, the p53 pathway is the most represented one together with the EGF signaling pathway (Fig. 2). At a molecular function level, the Panther classification system recognizes 63 genes out of the 84-gene set, which are structured in 6 different classes with “binding” and “catalytic activity” being the most expressed ones (Fig. 3). From the explored biological processes, the biggest impact is on “response to stimuli” and “cellular processes” (Fig. 4). Further analysis into the “cellular processes” group reveals that cell communication, cell proliferation, and cell cycle are the most represented groups (Fig. 5).

Fig. 2
figure 2

Molecular pathways associated with the 84 genes of the genotoxicity fingerprint. Pathway analysis with the Panther classification system reveals that the genotoxin-specific fingerprint, consisting of 84 genes, represents at least 24 biological pathways. Together with its feedback loops, the p53 pathway is the most represented pathway

Fig. 3
figure 3

Pie chart depicting the main molecular functions of the genes in the qPCR array genotoxicity fingerprint. Analysis of the 84 genes using the Panther classification system to connect molecular functions to genes identifies 63 genes in 6 different classes. The second pie chart shows the sub-classes of the “binding” group. Gene ontology (GO) annotations are mentioned between brackets and the number of genes within a certain group are shown

Fig. 4
figure 4

Biological processes in which the 84 genes of genotoxicity fingerprint are involved. The two smaller pie charts show the subdivisions of the “response to stimulus” and “cellular process” groups. Gene ontology (GO) annotations are mentioned between brackets and the number of genes within a certain group is shown

Fig. 5
figure 5

Panther classification analysis reveals 18 different protein classes for the 84 genes. Panther protein class (PC) annotations are mentioned between brackets

Performance of the gene classifier in discriminating between genotoxins and non-genotoxins

To classify a compound into a certain group, the SVM algorithm was used. This algorithm provides a probability between 0 and 1 for a compound to be genotoxic. A probability < 0.45 is considered negative and > 0.55 is considered positive. Probabilities between 0.45 and 0.55 are marked as equivocal.

To test the strength of the classification with the selected 84 genes, 8 test compounds 4 in vivo positive genotoxins (ETO, EMS, CHF, DAT) and 4 in vivo non-genotoxins (CLR, BOR, MAP, CLI) were selected and subjected to microarray analysis (Tables 1, 2). The results of the SVM classification analysis on the microarray data, using the 84-gene list, showed that 3 of the in vivo non-genotoxins were predicted as clearly negative and 3 of the in vivo genotoxins were classified as clearly positive. In both groups, there was one compound (CLI and CHF) with equivocal results and, therefore, was not appointed to either of the groups (Table 3). The classification results of the 8 compounds imply a sensitivity (correctly predicted positives) of 100%, a specificity (correctly predicted negatives) of 75%, and, thus, an accuracy (overall correctly predicted) of 87.5%, when counting the equivocal results as positive.

Table 3 Probabilistic classification of non-genotoxic and genotoxic test compounds, based on the 84-gene list after microarray experiments

Development of the qPCR array and classification results

The 84 genes identified in the microarray experiments were translated into a qPCR format. Hydrolysis probes were designed with FAM and TAMRA as fluorophore and quencher, respectively. The optimized primers, including those for 5 HKGs, were spotted in a 96-well plate. The layout of the developed qPCR plate is depicted in Fig. 6. As qPCR positive control and negative control, the highly deregulated gene (as seen in our microarray analyses) that encodes for carboxyl ester lipase (Cel) is used as marker. cDNA of cells exposed to BLE (positive, genotoxic control) and MAN (negative, non-genotoxic control) was added to the control wells. It should be noted that cDNA from any genotoxic and non-genotoxic reference compound, used to build the classifier, can serve as positive control and negative control, respectively.

Fig. 6
figure 6

qPCR plate design. (HKG, housekeeping genes; NTC, no template control; NAC, No Amplification control; PPC qPCR Positive control; PNC, qPCR Negative control)

To test the performance of the qPCR array, the same experimental conditions were maintained as for the microarray experiments. Besides the 8 test compounds that were used to test the classification abilities of the 84-gene list after the microarray experiments, an additional set of 4 test compounds (1 genotoxic, 1 “doubtful” genotoxic, 1 non-genotoxic, 1 “doubtful” non-genotoxic) was included. In total, 12 test compounds were included to test the performance of the qPCR array (5 “clear” genotoxic, 1 “doubtful” genotoxic, 5 “clear” non-genotoxic, 1 “doubtful” non-genotoxic).

The results of the SVM classification are presented in Table 4: 4 (EMS, ETO, DAT, CHF) out of the 5 in vivo genotoxins showed positive in the test, ENU came out equivocal. This implies a sensitivity of 100%, when considering the equivocal result as positive. All the 5 non-genotoxic compounds were also negative in the qPCR array. This implies a specificity of 100%. Our newly developed assay identified the controversial compound pCA as negative, whereas MAP showed equivocal results (Table 4).

Table 4 Probabilistic classification of known non-genotoxic (in vivo result negative) and genotoxic (in vivo result positive) test compounds, using SVM classification analysis

Discussion

A lot of efforts have been made by the scientific community to improve in vitro genotoxicity hazard assessment. A promising approach involves the implementation of gene expression or transcriptome analysis into an integrated testing strategy, allowing a judgment based on mechanistic information. Several research groups have suggested gene signatures to discriminate between genotoxic and non-genotoxic chemicals. These were based on both in vivo and in vitro exposure of the animals/cell systems to the model compounds (Watanabe et al. 2009; Mathijs et al. 2010; Boehme et al. 2011; Suenaga et al. 2013; Lee et al. 2013; Li et al. 2015; Williams et al. 2015; Rieswijk et al. 2016). Whereas the in vivo-based classifiers are built with a limited set of compounds, still requiring the need of animal experiments, most of the in vitro classifiers were often built with cell types not relevant to humans (Mathijs et al. 2010; Rieswijk et al. 2016) or metabolically incompetent (human) cells (Boehme et al. 2011; Buick et al. 2015; Li et al. 2015). In addition, the use of microarray data has not found its way into common safety assessment strategies. This may be attributed to the equipment needed, at one side, and the complicated data analysis and interpretation on the other side.

We claim to overcome the majority of the disadvantages that accompany microarray experiments for the purpose of safety assessment. To achieve this, we have translated a microarray-derived gene list, consisting of 84 genes, into a qPCR array. Not only have we generated an easy-to-perform assay that can be run in every molecular lab with basic qPCR equipment, but the design also allows simultaneous analysis of 84 genes covering various pathways and biological processes in a single run. Indeed, these 84 genes were selected using three different statistical tools to assure optimal genotoxicity predictions. The choice of reducing our initial classifier from 322 genes to 84 genes was to avoid overfitting the model to the set of reference compounds, as this can lead to reduced accuracy in future predictions (Tinker et al. 2006).

The predictions are made by exposing cells at sub-cytotoxic concentrations (IC10) of the test compound, assuring genotoxic rather than cytotoxic responses. This is another important improvement, as many of the existing in vitro tools require high cytotoxicity levels, above 50% (Kirkland 2011; Ates et al. 2014). Biologically, the genes represent different pathways involved in the DNA damage response, cell communication, several metabolic processes, apoptosis, and cell death in general. The fact that all these different pathways are represented and that the genes encode for a wide variety of proteins such as transcription factors, structural proteins, chaperones, transporters, etc., supports the notion that it is necessary to test more than one or a few genes when assessing the possible genotoxicity of compounds. Furthermore, our assay is based on the metabolically competent human-derived HepaRG cells, allowing human-relevant genotoxicity assessment of the mother compound and its metabolites with a single protocol. Indeed, all pro-genotoxins that were included as reference or test compound (Table 1) were correctly classified as being genotoxic. This will save time and resources and requires smaller amounts of the test compound. The latter is a relevant surplus, as in the early stage of compound development the amount of compound available may be a limiting factor. In addition, it has been shown that the use of rat liver extract (S9), usually required to test for possible (geno)toxic metabolites, can produce false positive results (Kirkland et al. 2007). It should, however, be noted that some compounds can exert genotoxic effects after intestinal metabolization, which is not covered in our test system (or most of the other existing in vitro genotoxicity tests). We, therefore, like to stress that we do not claim to use this assay as a standalone test, but as an important part of an integrated testing strategy. Case-by-case reasoning and decision-making are needed and extensive compound profiling should always be performed when (geno)toxicity is being assessed.

By opting hydrolysis (TaqMan) primers, a more specific detection of the target genes is assured, in comparison to SYBR green-based detection; the latter could lead again to false positive results. Hydrolysis primers also have a better sensitivity (low number of copies can be detected) than the SYBR green counterparts and they are known for their better reproducibility (Zhou et al. 2017). These are all important features that should allow detection of minor changes at the gene level, but also create an easy opportunity for upscaling the qPCR array into 384-well systems where 4 runs (4 × 96-wells) can be combined on 1 plate and even less amount of compound will be needed. This has been an additional consideration when reducing our gene classifier from 322 to 84 genes.

Our genotoxic-specific qPCR array showed 100% specificity and sensitivity on a modest set of 10 test compounds (5 in vivo genotoxins and 5 in vivo non-genotoxins). Additionally, we were able to provide some mechanistic information on compounds with unresolved or questionable genotoxicity profiles such as pCA and MAP. pCA gave a clear negative result in our test, although classified as in vivo genotoxin by Kirkland et al. (2016) and, therefore, recommended as a model test compound for the development and validation of in vitro genotoxicity tests. It should be noted that the genotoxicity of the compound is, however, under debate (Khoury et al. 2013). In fact, a report of the WHO summarizing the toxicity of pCA, concludes that the available data do not allow to make conclusions with respect to the genotoxicity of pCA. The positive results for this compound were only found at highly cytotoxic concentrations (Boehncke et al. 2003). However, these high levels might have been necessary to assure exposure to the bone marrow in the MNviv, and consequently, it cannot be ruled out that pCA is, indeed, an in vivo genotoxin. The conflicting results are again reflected as several well-performing in vitro genotoxicity screening tools could not detect pCA as genotoxic (Cahill et al. 2004; Mizota et al. 2011; Westerink et al. 2011; Hughes et al. 2012; Garcia-Canton et al. 2013; Hendriks et al. 2016). Thus, the question remains whether this compound is suitable as test compound for the validation of new in vitro genotoxicity assays. MAP [positive in the Ames test, MNvit, and in vitro chromosome aberration test (CAvit)], on the other hand, was identified as negative by the microarray data, but equivocal in the qPCR array. Even though microarray and qPCR are complementary methods, qPCR is often the method of choice to confirm or refute microarray data (Provenzano and Mocellin 2007). Therefore, the results in this study would support the evidence of pCA being in vitro non-genotoxic in sub-cytotoxic concentrations. MAP, on the other hand, would indeed seem to be an in vitro genotoxic compound even in sub-cytotoxic concentrations.

Even though the developed qPCR array needs further validation with an extended list of compounds, the promising results indicate that it can become a valuable addition to the in vitro genotoxicity testing strategies. In addition, it should be noted that in the present study, no trace compounds, impurities, or other complex chemicals such as food contact materials or nanoparticles have yet been tested. The merit of using a qPCR array in these particular fields might open new perspectives as several of the existing, validated in vitro genotoxicity assays fail for these specific types of chemicals or do not allow high-throughput. Additional investigations and, more importantly, practical applications are needed to further define the applicability domain of the qPCR array. In this study, we could show that it is possible to incorporate gene expression profiling in more routinely applied qPCR testing. With the animal testing ban in the European cosmetic legislation, new doors might open for compounds that show positive results in the regulatory in vitro battery, known to suffer from a high amount of false positive results. In addition, the qPCR array can be equally valuable in other sectors where compound development is hampered by excessive, expensive, and time-consuming in vivo testing or where development of promising compounds is needlessly stopped due to false positive genotoxicity test results.