Introduction

Modern drug discovery is a multidisciplinary enterprise consisting of disease-based target identification and validation, and high-throughput screening of chemical and natural product libraries (Kennedy et al. 2008). This is followed by the careful optimization of selected lead compounds, in vitro and in vivo pharmacokinetics, toxicology and bioavailability testing, and finally, preclinical and clinical studies (Lee and Dordick 2006; Hughes et al. 2011). These new drug candidates can be eliminated at different stages of drug development for many reasons, resulting in only 1 of 5000 lead candidates that pass the discovery process ever reaching the market. The total capitalized development cost per drug now approaches $2.6 billion, of which a large portion is attributable to drug candidate failures (DiMasi and Grabowski 2012).

A wide range of emerging in vitro technologies have begun to impact the assessment of chemical and drug candidate toxicity in high-throughput with an aim to reduce the need for animal models (Shukla et al. 2010). While many of these technologies are still nascent, a roadmap toward ultimate validation and industry adoption is becoming clearer. One area where such a roadmap is critically needed is in the confluence of chemical toxicity and human metabolism. Recent advances in genomics and proteomics coupled with sequencing of the human genome have dramatically increased the number of drug targets and their lead compounds (Schadt et al. 2009). Combinatorial and diversity-oriented synthesis programs along with increased access to natural products and their structural scaffolds have provided vast numbers of compounds to screen for identifying lead candidates. Conventional models to elucidate drug toxicity and human metabolism in vitro include isolated liver slices (Westra et al. 2016), primary hepatocytes (Hewitt et al. 2007), transformed hepatoma cell lines (Watanabe et al. 2003), immortalized liver cells expressing P450s (Gustafsson et al. 2014), as well as human liver microsomes (HLMs) and isolated recombinant cytochrome P450 (CYP450) isoforms (Brandon et al. 2003; Hariparsad et al. 2006). Hepatocytes and liver slices most closely resemble the in vivo system, and cryopreserved primary hepatocytes have been extensively used for in vitro drug testing, and are considered as the gold standard for drug screening (Soldatow et al. 2013). Primary hepatocytes provide a complete set of drug-metabolizing enzymes (DMEs) and pathways, and therefore, offer an appropriate system to test for toxicity, metabolite production, and drug stability and partitioning (Bale et al. 2016). Nevertheless, primary hepatocytes are expensive and difficult to obtain in large quantity for high-throughput toxicity screening (Soldatow et al. 2013). More problematic, these cells rapidly lose liver specific functions when maintained under standard in vitro cell culture conditions and often variably express CYP450s and other metabolizing enzymes over time (Gómez-Lechón et al. 2004). In addition, primary hepatocytes are not easy to pool due to varying expression levels of DMEs, which often result in significant experimental variability. Many modifications to conventional culture methods have been developed to foster retention of hepatocyte function. However, the current biotransformation functions of the liver are likely difficult to be mimicked at desired in vivo levels (Sivaraman et al. 2005; Hewitt et al. 2007; Huch et al. 2015).

While these problems represent a major gap in the development of high-throughput in vitro techniques for concordance between in vitro assays and in vivo responses, and consequently, there is a significant opportunity for new technologies to fill this gap. Complicating the difficulty of in vitro systems to mimic human metabolism and toxicity, predicting human responses in drug testing and disease research in vivo with animal tests have poor outcomes. For example, animal models have failed to reproduce human liver toxicity of troglitazone (Rezulin™), which was withdrawn from the market because of CYP450-medicated hepatotoxicity in humans (Reddy et al. 2005; Masubuchi et al. 2006). Mibefradil was also withdrawn due to hepatotoxicity, cardiovascular toxicity, and drug interactions via CYP450 isoforms (Bui et al. 2008). Moreover, drug candidates that fail in clinical trials due to toxicity concerns, by definition, were not flagged by animal models as being potentially toxic. Thus, there remains a gap in our ability to identify toxic drug candidates before clinical testing.

To address these needs, we have developed the Data Analysis Toxicology Assay Chip (DataChip) and the Metabolizing Enzyme Toxicology Assay Chip (MetaChip) technologies that link metabolism and cell-based screening (Lee et al. 2005, 2008). However, the earlier version of the DataChip/MetaChip (i.e., the DataChip 1.0 and MetaChip 1.0) used chemically functionalized microscope glass slides, which required direct contact between cell spots and metabolizing enzyme spots through a liquid layer on sandwiched glass slides to transfer compounds and their metabolites to target cells. It had several technical limitations such as the difficulty of accurately aligning cell/enzyme spots on the glass slides and limited incubation times (typically 6 h). To overcome these technical issues, a plastic micropillar/microwell chip platform (Lee et al. 2013, 2014a, b) was developed and used in the DataChip/MetaChip platform. By inserting a micropillar chip (DataChip) into a microwell chip (MetaChip), the new DataChip/MetaChip platform 2.0 eliminates the spot alignment issue and provides sufficient growth media with compounds in the microwell for cell culture (typically 1–3 days). In addition, the composition of DME solutions on the MetaChip was changed from individual DMEs to mixtures of DMEs to better mimic DME conditions in the liver. In this study, we report in vitro toxicity data obtained for a set of 22 compounds and their in situ-generated metabolites using the high-throughput DataChip/MetaChip platform 2.0 and correlate the in vitro IC50 values from the chip with rat LD50 values as well as human C max values to better predict drug-induced liver injury (DILI) in humans.

Materials and methods

Preparation of the micropillar and microwell chip

A micropillar chip made of poly (styrene-co-maleic anhydride) (PS-MA) contains 532 micropillars (0.75 mm pillar diameter and 1.5 mm pillar-to-pillar distance). In addition, a microwell chip made of co-polymer of polystyrene and polybutadiene has a complementary set of 532 microwells (1.2 mm microwell diameter and 1.5 mm well-to-well distance). PS-MA provides a reactive functionality to covalently attach poly-l-lysine (PLL) and, ultimately, alginate spots by ionic interactions. Plastic molding was performed by the SODIC PLUSTECH injection molder in Samsung Electro-Mechanics Company (SEMCO, Suwon, South Korea).

Human liver cell culture and preparation of cell suspension for spotting

Hep3B human hepatoma cell line [American Type Culture Collection (ATCC), Manassas, VA, USA] at passage numbers between 15 and 23 was grown in RPMI 1640 (Mediatech, Manassas, MA, USA) supplemented with 10% fetal bovine serum (FBS, Sigma-Aldrich, St. Louis, MO, USA) and 1% Penicillin–Streptomycin (P/S, ThermoFisher Scientific, Waltham, MA, USA) in T-75 cell culture flasks in a humidified 5% CO2 incubator (ThermoFisher Scientific) at 37 °C. Suspensions of Hep3B cells were prepared by trypsinizing a confluent layer of the cells from the culture flask with 0.6 mL of 0.05% trypsin-0.53 mM EDTA (ThermoFisher Scientific), and re-suspending the cells in 7 mL of 10% FBS-supplemented RPMI. After centrifugation at 300×g for 4 min, the supernatant was removed and the cell pellets were re-suspended with 10% FBS-supplemented RPMI to a final concentration of 6 × 106 cells/mL.

2D cell viability assessment

For toxicity testing on Hep3B monolayers in 96-well plates (i.e., 2D culture), 5.0 × 103 cells were seeded with 200 µL RPMI media in each well and incubated in the CO2 incubator. Following overnight pre-incubation, the cells were treated with test compounds at varying concentrations for 72 h. After incubation, the cells were incubated with 50 µL of 2.5 mg/mL MTT solution in PBS for 3 h at 37 °C. Purple-colored MTT-formazan crystals generated in metabolically active cells were measured by completely removing the MTT solution and adding 150 mL of DMSO. After shaking for 30 min at 150 rpm, absorbance was measured at 590 nm using a microtiter plate reader (Synergy H1, BioTek instruments, VT, USA).

Preparation of a miniaturized 3D cell culture array (DataChip) on a micropillar chip

To attach cell spots on the micropillar chip, a mixture of poly-l-lysine (PLL, Sigma-Aldrich) and BaCl2 (Sigma-Aldrich) was prepared by mixing an equal volume of 0.01% (w/v) PLL and 100 mM BaCl2. The DataChip was prepared by spotting 60 nL/micropillar of the PLL/BaCl2 mixture onto each of the 532 micropillars using a microarray spotter (S + MicroArrayer, Samsung ElectroMechanics, Co. (SEMCO)) and allowed to dry for 24 h. This was followed by printing 60 nL/micropillar of Hep3B cells suspended in 0.75% (w/v) alginate on top of the dried PLL/BaCl2 spots. While printing Hep3B cells, the micropillar chip was placed on a chilling deck at 4 °C to retard evaporation of water in the spots. The suspension of Hep3B cells in low-viscosity alginate (Sigma-Aldrich) was prepared by mixing 500 µL of the Hep3B cell suspension in 10% FBS-supplemented RPMI (6 × 106 cells/mL), 250 µL of 3% alginate in distilled water, and 250 µL of RPMI so that the final concentration of the cells and alginate were 3 × 106 cells/mL and 0.75%, respectively. After nearly instantaneous gelation, each Hep3B cell spot was immersed in 800 nL of RPMI growth medium in the complementary microwell by sandwiching the micropillar chip with the cells (DataChip) and the microwell chip containing growth media together (“stamping”). The stamped chips placed in a gas-permeable incubation chamber for 30 min to remove excess BaCl2 were then separated and the DataChip was re-stamped onto the microwell chip containing fresh growth media. Finally, the stamped chips were incubated in the CO2 incubator at 37 °C for 18 h prior to toxicity assessment.

Preparation of a miniaturized enzyme array (MetaChip) on a microwell chip

The MetaChip, a complementary array of encapsulated metabolizing enzymes that was designed to emulate the metabolic reactions in the human liver, was prepared on a microwell chip made of a co-polymer of polystyrene and polybutadiene. Fresh metabolizing enzyme solutions in Matrigel were prepared in a 96-well plate on ice (Table 1) and 120 nL of metabolizing enzyme mixtures in Matrigel were printed on the microwell chip laid on a chilling deck at 4 °C. The MetaChip was transversely divided into four regions (I–IV in Fig. 1c). Specifically, regions I–IV contained no enzyme as a test compound only control, a mixture of human CYP450 isoforms (P450 Mix), a mixture of P450 Mix and human Phase II metabolizing enzymes (All Mix), and human liver microsomes (HLM). Immediately after enzyme printing, the MetaChip was placed in a Petri dish (4 MetaChips per 150 mm-diameter Petri dish) and stored in a − 80 °C freezer until use.

Table 1 Composition of enzyme mixtures for preparing the MetaChip
Fig. 1
figure 1

Schematics and photographs of the micropillar/microwell chip with cells and enzymes printed. a The structure of the micropillar and microwell chip. The inset shows the scheme of each chip, including cells and drug-metabolizing enzymes (DMEs) printed. b Experimental procedures to prepare the DataChip/MetaChip for metabolism-induced toxicity assays. After cell printing on the micropillar chip, the DataChip was sandwiched with the microwell chip containing growth media for 3D cell culture. For the MetaChip, DME mixtures and compounds were printed into the microwell chip sequentially. This was followed by the DataChip sandwiched with the MetaChip and incubated for cytotoxicity assays. c The layout of DMEs and compounds printed in the microwell chip to prepare the MetaChip and test metabolism-induced toxicity. Regions I–IV contained no enzyme as a test compound only control, a mixture of human cytochrome P450 isoforms (P450 Mix), a mixture of P450 Mix and human Phase II metabolic enzymes (All Mix), and human liver microsomes (HLM), respectively. Regions 1–6 contained six different compounds in triplicate microwells. From left to right, the concentration of each compound was increased (6 concentrations per compound in each region). d Microscopic picture of Hep3B cell growth on the micropillar chip after 3 days. e The growth of Hep3B cells encapsulated in alginate spots on the DataChip over time

Stamping the DataChip onto the MetaChip with test compounds

Metabolism-induced toxicity assays were performed by printing compounds into the MetaChip and then stamping the DataChip onto the MetaChip. The compounds selected were acetaminophen (as a positive control), benzbromarone, fenoterol, flutamide, diclofenac, labetalol, imipramine, phentolamine, risperidone, oxybendazole, sulindac, propranolol, promazine, trazodone, buspirone, carbidopa, bosentan, chlorpropamide, phenazopyridine, estradiol, mefenamic acid, and fluoxetine, all of which were water soluble at the highest dosages to avoid issues with precipitation over time and adsorption on the chip surfaces. Briefly, compound stock solutions were prepared by dissolving compounds in DMSO. Typically, higher than 100 mM of compound stock solutions were required to maintain final DMSO content less than 0.5%. Approximately 40 µL of test compound solutions were prepared in 200-fold higher concentrations than the desired final concentration (5 dosages plus 1 control) by serially diluting compound stock solutions in DMSO in a 384-well plate. As a control, 100% DMSO without compound was used. After that, 300 µL of diluted test compound solutions in a round-bottom 96-well plate was prepared by mixing 1.5 µL of the diluted compounds in DMSO with 298.5 µL of RPMI (typically 0–1000 µM of final concentrations). Frozen MetaChips were removed from the freezer and immediately placed on the cold slide deck at 4 °C, and then 720 nL of the test compound solutions in RPMI were printed into each well of the MetaChip using the microarray spotter. Six different compounds were printed in sections 1–6 of the MetaChip, each region containing a 12 × 6 mini-array. The stamped chips were placed, with the DataChip on top, in the gas-permeable chamber with 20 mL of sterile distilled water to prevent water evaporation during incubation, and then incubated for 24 h in the CO2 incubator at 37 °C for cytotoxicity assays. After 24 h incubation with compounds, the MetaChip was discarded, the DataChip was stamped onto the pre-warmed microwell chip with 800 nL/well of fresh RPMI, and then the stamped chips were incubated in the gas-permeable chamber in the CO2 incubator at 37 °C for additional 48 h.

Cell staining, chip scanning, and data analysis

At the end of the 48 h culture period post-MetaChip stamping, the DataChip was washed twice by immersing the micropillars with cell spots in a deep-well staining plate containing 5 mL of 140 mM NaCl with 20 mM CaCl2 for 5 min each. CaCl2 was supplemented to prevent degradation of alginate spots by excess phosphate. A staining dye solution was prepared by adding 1.0 µL of calcein AM (4 mM stock from ThermoFisher Scientific) and 4.0 µL of ethidium homodimer-1 (2 mM stock from ThermoFisher Scientific) in 8 mL of 140 mM NaCl supplemented with 20 mM CaCl2. To stain the cell spots, 2 mL of the dye solution was dispensed on a shallow-well staining plate and then the DataChip was placed on the top of the shallow-well staining plate, and incubated in the dark for 45 min at room temperature. The DataChip was then washed twice by immersing micropillars with cell spots in the deep-well staining plate containing 5 mL of 140 mM NaCl with 20 mM CaCl2 for 15 min each to remove excess dye in the alginate spots. After drying the DataChip in dark for at least 2 h, the location of each cell spot where compounds added was detected by imaging the entire DataChip using a blue laser (488 nm) and a standard blue filter for green dye (PMT gain: 180 and power: 10) and a blue laser and a 645AF75/594 filter for red dye (PMT gain: 200 and power: 10) in a GenePix® Professional 4200A scanner (MDS Analytical Technologies). Due to the scanning height difference of the micropillar chip from standard glass slides, the DataChip was scanned at focus position 120. Data files were saved as single images for analysis. The green fluorescence intensity was quantified from the scanned images using the S + Chip Analysis (SEMCO) program by extracting fluorescent intensity from each cell spot and plotting the percentage of live cells against the concentration of the compound tested. We used a background subtraction of dead cells (cells immersed in 70% methanol for 1 h), which was negligible compared to the total fluorescence. The percentage of live cells was calculated using the following equation:

$${\text{\% Live cells}}=\frac{{{F_{{\text{Reaction}}}}}}{{{F_{{\text{Max}}}}}} \times 100$$

where F Reaction is the green fluorescence intensity of the reaction spot and F Max is the green fluorescence intensity of untreated viable cells. To produce a conventional sigmoidal dose–response curve, with response values normalized to span the range from 0 to 100% plotted against the logarithm of test concentration, the green fluorescence intensities of all cell spots were normalized with the fluorescence intensity of 100% live cell spot (i.e., cell spots contacted with no compound) and the test compound concentration was converted to their respective logarithms. The sigmoidal dose–response curves and IC50 values (concentration of the compound where 50% of cell growth inhibited) were obtained using the following equation:

$$Y={\text{Bottom}}+\left( {\frac{{{\text{Top}} - {\text{Bottom}}}}{{1+{{10}^{(\log {\text{I}}{{\text{C}}_{50}} - X) \times H}}}}} \right)$$

where IC50 is the midpoint of the curve, H is the hill slope, X is the logarithm of test concentration, and Y is the response (% live cells), starting at Bottom and going to Top with a sigmoid shape.

Toxicity prediction with sensitivity and specificity analysis

To assess the predictivity of metabolism-induced compound toxicity, sensitivity and specificity were calculated using IC50 values from the DataChip/MetaChip platform, human C max values, and rat LD50 values determined by oral administration. Briefly, test compounds that exhibited an IC50 value less than or equal to an arbitrary IC50 cutoff at a given cell/enzyme condition were categorized as toxic. Similarly, test compounds that exhibited a human C max value or a rat LD50 value less than or equal to an arbitrary C max or LD50 cutoff were categorized as toxic. Based on the results of IC50, C max, and LD50 evaluation, the test compounds were classified into four categories: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). For example, when arbitrary cutoffs of LD50 of 300 mg/kg and IC50 of 250 µM are used, TP, FP, TN, and FN are determined as follows:

  • True positive (TP): LD50 ≤ 300 mg/kg (toxic) and IC50 ≤ 250 µM (toxic)

  • False positive (FP): LD50 > 300 mg/kg (nontoxic) and IC50 ≤ 250 µM (toxic)

  • True negative (TN): LD50 > 300 mg/kg (nontoxic) and IC50 > 250 µM (nontoxic)

  • False negative (FN): LD50 ≤ 300 mg/kg (toxic) and IC50 > 250 µM (nontoxic)

The predictive performance of the DataChip/MetaChip technology from test compounds was assessed by calculating sensitivity and selectivity as follows:

  • Sensitivity (%) = [Number of in vitro toxic test compounds (TP)]/[Number of in vivo toxic test compounds (TP + FN)] × 100

  • Specificity (%) = [Number of in vitro nontoxic test compounds (TN)]/[Number of in vivo nontoxic test compounds (TN + FP)] × 100

  • Overall predictivity (%) = [sensitivity + specificity]/2

The arbitrary LD50 cutoffs were determined based on OECD categories for testing in vivo compound toxicity (OECD 2002). Since identifying optimum cutoffs for obtaining high predictivity are paramount importance, we either varied both in vivo and in vitro cutoffs simultaneously (variable cutoffs) or varied in vitro cutoffs at a fixed in vivo cutoff (fixed cutoffs). For both variable and fixed cutoffs, the acceptance limit for both sensitivity and specificity was set for greater than 50%.

Results

Miniaturized 3D cell culture on the DataChip

The highly versatile DataChip/MetaChip platform is based on micropillar/microwell structures made by plastic injection molding, which is a robust and flexible system for mammalian cell culture, enzymatic reactions, and compound screening (Fig. 1a, b). For 3D cell culture, small Hep3B cell spots were printed on the micropillar chip and strongly attached through robust surface chemistry, as in our previous studies (Lee et al. 2008). The maleic anhydride group in PS-MA was used to covalently attach PLL with amine groups, which led to the negatively charged alginate attaching to positively charged PLL by ionic interactions. BaCl2 was used for the gelation of the alginate matrix on the micropillar chip. After incubating the stamped DataChip onto the microwell chip with RPMI for 3 days, a unique 3D morphology of Hep3B cells in 60 nL alginate spots was observed on the micropillar chip. The Hep3B cells in the spots were stained with calcein AM and ethidium homodimer for assessing live and dead cells and determining cell viability. Based on the calculation of the green fluorescence intensity from stained cell spots on the DataChip, the population of Hep3B cells on each micropillar was very uniform with a 14% coefficient of variability. To determine 3D cell growth quantitatively on the chip, changes in the green fluorescence of Hep3B cells at 3 million cells/mL seeding density (i.e., 180 Hep3B cells per 60 nL spot) were monitored over time. As evidenced by increase in green fluorescence over time, Hep3B cells in alginate spots grew linearly, forming unique 3D spheroids (Fig. 1d, e). The doubling time of Hep3B cells on the chip calculated from green fluorescence intensities measured was approximately 60 h. 3D-cultured Hep3B cells on the chip at the high seeding density grew approximately half as fast as in the 2D counterpart (32 h doubling time), presumably due to the nature of 3D cell culture and limited space available for cell growth within small alginate spots.

Metabolism-induced toxicity assessment in combination with the DataChip and the MetaChip

To mimic human metabolism in high-throughput screening, the DataChip containing 3D-cultured hepatic cells was coupled with the MetaChip containing DMEs and model compounds to generate their metabolites in situ on the chip and assess metabolism-induced toxicity of the compounds in Hep3B cell spheroids. The Hep3B cells within 12 × 6 mini-arrays were exposed to six different dosages of a compound and four different DME conditions, including no DME control, P450 Mix, All Mix, and HLM (Table 1). Thus, a single Data chip combined with a single MetaChip had the capability to generate 24 dose response curves for 6 compounds and their metabolites from DMEs (Fig. 1c). To study metabolism-induced toxicity with model compounds, IC50 values were determined for a parent test compound and its enzyme-generated metabolites against Hep3B cells by staining the DataChip with a Live/Dead® cell viability kit. As demonstrated in Fig. 2a, cell death occurred when Hep3B cells were exposed to high concentrations of acetaminophen in the presence of P450 Mix. The dotted circles represent the boundary of the micropillars onto which Hep3B cell spots were encapsulated. Green dots indicate live cells, whereas dark-red dots represent dead cells. The activity of metabolizing enzymes on the frozen MetaChip was stable for at least 6 months.

Fig. 2
figure 2

The scanned images of Hep3B cells stained after compound treatment. a Images of Hep3B cells on the DataChip after exposure to the MetaChip containing P450 Mix and different concentrations of troglitazone (3–250 µM). Live cells are stained in green and dead cells are stained in red. b The scanned image of the DataChip with Hep3B cells after exposure to the MetaChip containing metabolic variance (no enzyme, P450 Mix, All Mix, and HLM) and compound variance (6 compounds at 6 different concentrations per compound). Representative dose response curves shown were obtained from carbidopa (Compound 5) and acetaminophen (Compound 6). Hep3B cell spots in the 24 distinct regions (each region containing a 3 × 6 mini-array, triplicates with 6 varied concentrations) were exposed to various combinations of compounds and DMEs. (Color figure online)

The fundamental question we wanted to address in this study was whether or not we could predict in vivo adverse drug responses on the chip platform. Acetaminophen (an analgesic and antipyretic drug) that is known to be hepatotoxic by CYP450 catalysis, and is a major cause of liver failure, was selected as a key model compound and included on each chip to monitor chip-to-chip and day-to-day variability (Supplementary Table 1). As expected, acetaminophen demonstrated metabolism-induced toxicity on the chip, as evidenced by the toxic response of Hep3B cells when they were exposed to P450 Mix and All Mix. This result indicates that the DataChip/MetaChip platform could predict hepatotoxicity caused by active metabolites of acetaminophen (most likely N-acetyl-p-benzoquinone imine) (Andersson et al. 2011) produced by human liver CYP450 isoforms. The degree of cytotoxicity in the All Mix was less than that of P450 Mix, presumably due to Phase II DMEs included in All Mix, which could reduce toxicity of toxic metabolites generated in situ on the chip through conjugation reactions. Similar results were obtained from Hep3B cells exposed to acetaminophen and HLM (Supplementary Table 1). The degree of cytotoxicity in the HLM was less than that of All Mix, presumably due to larger amounts of Phase II DMEs included in HLM. HLM purchased from BD Biosciences contained approximately 5–10 times large amount of UDP-glucuronosyltransferase (UGT) isoforms compared to All Mix, but did not contain sulfotransferase (SULT), glutathione S-transferase (GST), and N-acetyltransferase (NAT) isoforms. Thus, All Mix is a better mimic of human liver.

To further validate the concept and calculate predictivity of in vivo hepatotoxicity, the 22 compounds were printed on the MetaChip in triplicate and tested under the four DME conditions. As a result, several compounds showed augmented toxicity or were detoxified by the DMEs. For example, carbidopa, which is used to manage the symptoms of Parkinson’s disease, was activated in P450 Mix and All Mix on the chip presumably due to formation of toxic metabolites (Fig. 2b). Eighteen compounds were found to be toxic against Hep3B cells on the chip, out of which nine compounds showed statistically significant, augmented toxicity in the P450 Mix, indicating that toxic metabolites could be generated on the chip by CYP450 isoforms. Two compounds (carbidopa and oxybendazole) showed augmented toxicity in All Mix compared to their parent compounds. In addition, in situ, on-chip metabolism of flutamide, sulindac, and mefenamic acid led to statistically less toxicity in the All Mix vs. the respective parent compounds (Table 2). The cytotoxicity profiles under varying DME conditions on the chip were well correlated with representative toxic metabolites of the compounds generated.

Table 2 Summary of IC50 values obtained from 3D Hep3B cells on the DataChip/MetaChip and 2D Hep3B cell monolayers in 96-well plates

Prediction of hepatotoxicity in vivo by comparing rat LD50 and IC50 values from the chip

A common way to predict in vivo toxicity using in vitro data is to compare LD50 values with IC50 values at arbitrary cutoffs and determine the number of compounds that can be classified into TP, FP, TN, and FN. Thus, we calculated sensitivity and specificity to assess in vivo metabolism-induced hepatotoxicity by comparing rat oral LD50 values with IC50 values from the DataChip/MetaChip. We initially tested a range of cutoffs to identify an optimum cutoff that can provide high sensitivity and specificity from the chip (Supplementary Fig. 1), thus providing high predictivity of in vivo hepatotoxicity from the IC50 values. The acceptance level of sensitivity and specificity was set for greater than 50%.

As a result, we were able to obtain a good in vivo LD50-in vitro IC50 correlation at cutoffs of LD50 of 300 mg/kg and IC50 of 250–450 µM depending on DME conditions tested. Among all DME conditions tested, including no enzyme control in 2D and 3D-cultured Hep3B cells, P450 Mix in 3D, All Mix in 3D and HLM in 3D (Fig. 3), All Mix in 3D appeared to be a better predictor of rat in vivo acute toxicity with 60% sensitivity and 60% specificity (60% overall predictivity). This outcome is likely due to the presence of both Phase I and II DMEs in the All Mix, and hence, being more representative of the in vivo situation. Thus, our approach could be applied to predict human acute toxic potential of drug candidates in the liver. Interestingly, the standard in vitro toxicity assessment in 2D without DMEs gave noticeably relatively poor predictivity. In addition, no enzyme control in 3D-cultured Hep3B cells on the chip at cutoffs of LD50 of 300 mg/kg and IC50 of 450 µM could produce 60% sensitivity and 53% specificity, which indicate that 3D cell culture is superior to 2D cell culture in terms of predicting in vivo hepatotoxicity (Fig. 3a, b). Overall, comparing in vivo LD50 values with in vitro IC50 values from the chip led to as high as 60% predictivity, which suggests that our DataChip/MetaChip platform could predict in vivo rat hepatotoxicity. Of course, these were mixed species correlations involving human DMEs and a transformed cell line in comparison with rat LD50 literature. To achieve greater human predictivity, human in vivo information is needed.

Fig. 3
figure 3

Calculation of sensitivity and specificity using LD50 values at 300 mg/kg cutoff and IC50 values at variable cutoffs: LD50 compared with a IC50 from 2D-cultured Hep3B cells in the 96-well plate, b IC50 from 3D-cultured Hep3B cells without enzymes on the chip, c IC50 from 3D Hep3B cells with P450 Mix, d IC50 from 3D Hep3B cells with All Mix, and e IC50 from 3D Hep3B cells with HLM. To calculate sensitivity, specificity, and overall predictivity, LD50 and IC50 values were compared at different cutoffs, and then TP, FP, TN, and FN were determined. The red dashed line represents 50% acceptance limit for sensitivity, specificity, and overall predictivity, and all of which have to be above the line to be acceptable. (Color figure online)

Prediction of hepatotoxicity in vivo by comparing human C max and IC50 values from the chip

We proceeded to evaluate a human pharmacokinetic endpoint to address the discrepancy between animals and humans in terms of toxicity evaluation. To this end, we used in vivo human C max values, which represent the maximum allowable concentration of a drug in serum. C max values can be considered an indirect indicator of drug toxicity, as the concentration above C max could cause harmful side effects in the body (Jang et al. 2001). Since C max values are much lower than IC50 values, in general, we compared 10-, 100-, and 1000-fold human C max values to IC50 values determined using the chip platform to calculate specificity and sensitivity. Overall, use of the 1000-fold human C max values resulted in higher predictivity compared to use of 10- and 100-fold counterparts (Supplementary Fig. 2). Interestingly, the 1000-fold human C max and LD50 comparison generated only 50% sensitivity and 65% specificity at 150 variable cutoffs (Fig. 4a). Not surprisingly, this result indicates that there is a poor correlation between rat in vivo data and human in vivo data. To better understand this outcome and identify optimum cutoffs for in vivo animal and human correlations, we calculated sensitivity and specificity in detail at fixed cutoffs. The highest predictivity (67% sensitivity and 68% specificity) was obtained at 150 µM cutoff for 1000-fold human C max and 200 mg/kg cutoff for rat LD50, which is still lower than the predictivity obtained from the chip platform (Supplementary Fig. 3). As opposed to relatively poor animal predictivity, and the poor correlation of in vitro 2D results (Fig. 4a, b), our chip data outperformed in terms of toxicity prediction under control, P450 Mix, All Mix, and HLM conditions (Fig. 4c–f). In particular, the 1000-fold human C max and All Mix IC50 comparison generated remarkable 100% sensitivity and 86% specificity at 50 variable cutoffs (93% overall predictivity). These results indicate that combining 3D hepatic cell culture with drug metabolism on the chip platform could provide better predictivity of hepatotoxicity in vivo as compared to animal and in vitro 2D counterparts. Overall, maximum predictivity achieved at optimum cutoffs by comparing LD50 and IC50 values with 10, 100, and 1000-fold human C max values is summarized in Fig. 5. As indicated in Fig. 5, the highest sensitivity and specificity was obtained from All Mix compared with 1000-fold human C max values. This outcome implies once again that All Mix containing both Phase I and II DMEs could be a better indicator for predicting hepatotoxicity in vivo. The All Mix was better than the P450 Mix, thereby showing the importance of a full complement of DMEs in predicting hepatotoxicity in vivo. Indeed, the P450 Mix was worse than the no enzyme control.

Fig. 4
figure 4

Calculation of sensitivity and specificity using 1000-fold human C max values at variable cutoffs and LD50 and IC50 values at variable cutoffs: 1000-Fold C max compared with a rat LD50, b IC50 from 2D-cultured Hep3B cells, c IC50 from 3D-cultured Hep3B cells without enzymes, d IC50 from 3D Hep3B cells with P450 Mix, e IC50 from 3D Hep3B cells with All Mix, and f IC50 from 3D Hep3B cells with HLM. To calculate sensitivity, specificity, and overall predictivity, 1000-fold human C max values were compared with LD50 and IC50 values at different cutoffs, and then TP, FP, TN, and FN were determined

Fig. 5
figure 5

Maximum predictivity achieved using 10, 100, and 1000-fold human C max with LD50 and IC50 values: C max compared with a rat LD50, b IC50 from 2D-cultured Hep3B cells (2D IC50—Control), c IC50 from 3D-cultured Hep3B cells without enzymes on the chip (3D IC50—Control), d IC50 from 3D Hep3B cells with P450 Mix (3D IC50—P450 Mix), e IC50 from 3D Hep3B cells with All Mix (3D IC50—All Mix), and f IC50 from 3D Hep3B cells with HLM (3D IC50—HLM). Color coding of bars indicate as follows: white—sensitivity, light gray—specificity, and black—overall predictivity

Prediction of hepatotoxicity in vivo by comparing drug-induced liver injury (DILI) index and IC50 values from the chip

The test compounds have been previously categorized according to their ability to cause drug-induced liver injury (DILI) in humans (Xu et al. 2008). For example, our test compounds fell into one of seven DILI categories: (a) P1 if it is associated with DILI in either animals or humans in a dose-dependent manner, (b) P2 if it is associated with idiosyncratic DILI, (c) O1 if it is hepatotoxic in animals, but untested in humans, (d) O2 if it causes elevated liver enzymes in humans, but generally safe, (e) N3 if it causes sporadic cases of DILI, but generally safe, (f) N2 if it is unknown to cause DILI but known to cause other organ injury, and (g) N1 if it is not known to cause DILI (Table 3). In general, compounds in P1, P2, and O1 categories are considered as hepatotoxic, and a compound O2, N1, N2, and N3 categories is considered as minimally or none hepatotoxic.

Table 3 Drug-induced liver injury (DILI) categories sorted by hepatotoxicity levels

To predict DILI from the in vitro chip data, 1000-fold C max values were used as a threshold to differentiate compounds causing DILI from compounds not causing DILI (non-DILI). We hypothesized that compounds potentially causing DILI would be due to toxicity from the parent compounds or their metabolites. In addition, the highest predictivity would be obtained from All Mix. Thus, we defined a DILI-causing compound if it has an IC50 value from All Mix < 1000-fold C max. On the other hand, a compound that has an IC50 value from All Mix ≥ 1000-fold C max was considered as not causative of DILI. Finally, TP, FP, TN, and FN were determined by comparing DILI/non-DILI outcomes from the chip with a compound’s DILI category. For example, if a DILI-causing compound from the chip is in one of the P1, P2, and O1 categories, then it is a TP compound. Similarly, if a compound that does not cause DILI from the chip is in one of the P1, P2, and O1 categories, then it is a FN compound. The sensitivity of the DataChip/MetaChip platform was defined as the ability of the chip platform to predict the P1, P2, and O1 compounds as hepatotoxic [i.e., TP/(TP + FN)]. The specificity of the chip platform was defined as the ability to predict O2, N1, N2, and N3 compounds as nontoxic for DILI [i.e., TN/(TN + FP)].

Out of the 16 P1, P2, and O1 compounds tested, the DataChip/MetaChip platform could predict the DILI potential of ten compounds in the All Mix system. The sensitivity of the chip platform under the All Mix system was 63%. Out of the six N1 and N3 compounds tested, the DataChip/MetaChip could predict five compounds in All Mix that were not hepatotoxic. Thus, the specificity of the chip platform under the All Mix system was 83% (Table 2). Our overall predictivity from DILI index was 73%. The outcome was further compared with overall predictivity from 10-fold and 100-fold C max values compared with All Mix IC50 values as well as 1000-fold C max values compared with 2D counterpart IC50 values (Fig. 6). Interestingly, predictivity calculated with DILI categories and 1000-fold human C max values in All Mix was much higher than other counterparts including 10-fold, 100-fold human C max in All Mix, as well as 1000-fold human C max with 2D Hep3B cells.

Fig. 6
figure 6

Predictivity calculated by comparing IC50 values from 3D Hep3B cells-All Mix with human C max values and then determining TP, FP, TN, and FN by further comparing with DILI categories: a 10-fold C max, b 100-fold C max, and c 1000-fold C max compared with IC50 from 3D Hep3B cells-All Mix. d 1000-fold C max compared with IC50 from 2D Hep3B cells. To calculate sensitivity, specificity, and overall predictivity, first, human C max values were compared with IC50 values from 3D Hep3B cells-All Mix to determine temporary toxicity prediction of the compound. This toxicity prediction was further compared with DILI categories to determine TP, FP, TN, and FN. For example, when the prediction from human C max and IC50 is toxic and the compound is classified as hepatoxic by DILI categories, then the prediction of the compound is true positive (TP). After determining TP, TN, FP, and FN for all compounds, sensitivity, specificity, and overall predictivity were calculated accordingly

Discussion

Existing in vitro screening technologies for assessing drug metabolism and toxicology lack the ability to provide information on highly predictive metabolism-induced drug toxicity and the necessary throughput for early stage go/no-go decision for lead compounds, and therefore, do not address a critical bottleneck in the drug development process. The goal of this study was to understand whether the new DataChip/MetaChip platform could be used to screen metabolism-induced compound toxicity by correlating our in vitro chip data with known toxicity profiles of test compounds.

There may be several reasons for poor predictivity of adverse drug reactions (ADRs) in humans using current in vitro assays as well as in vivo animal models. First, current in vitro cell models, including 2D hepatoma cell monolayers and sandwiched hepatocytes, may not adequately represent at human liver tissue, thus lacking accurate biochemical and cellular responses in vivo. Several hepatic cell models lack key hepatic properties, including metabolism competence, drug transporters, and cell–cell interactions between hepatocytes and immune cells. Second, current in vitro assays may not provide proper biological circumstances necessary to predict toxicological reactions. For example, defensive pathways such as nuclear factor erythroid 2-related factor 2 (Nrf-2) and nuclear factor-kappa B (NF-kB) can affect the toxic response depending on their levels of activation (Osburn and Kensler 2008; Tak and Firestein 2001). Third, current in vitro assays may not give sufficient information on surrounding cell types, proteins affected by metabolism, and toxicological pathways. Fourth, the number of potentially hepatotoxic compounds identified from individual assays may not be sufficient to capture the broad array of mechanisms leading to in vivo toxicity. In the case of animal models, there are significant cross-species differences between animals and humans (Shanks et al. 2009). Thus, predicting human toxicity with study outcomes from rats, mice, and rabbits is challenging in general.

To address these issues, there have been several in silico approaches developed in recent years to predict human toxicity directly from in vitro toxicity data. One of good examples of in silico approaches is the in vitro–in vivo correlation (IVIVC) model, which is provided by the U.S. Food and Drug Administration (FDA) with diverse formulations and guidelines for predicting in vitro and in vivo pharmaceutical correlations. It has been used as a surrogate to reduce bioavailability studies of new drugs (Emami 2006; Sakore and Chakraborty 2011). Several research groups reported IVIVC results using a correlation between bioavailability variables and in vitro data (Emara et al. 2000; Mahayni et al. 2000; Balan et al. 2001). Another pioneering approach is the in vitro–in vivo extrapolation (IVIVE) model, which refers to computational simulation to predict in vivo pharmacokinetics (PK) data such as C max values from in vitro experimental data such as IC50 values (Yoon et al. 2015; Yoon and Clewell 2016). For example, Johnson et al. used IC50 values from 11 drugs to calculate the predictivity of in vivo clearance in neonates, infants, and children (Johnson et al. 2006). In addition, US Environmental Protection Agency (EPA) applied the IVIVE model to grapple with potential human toxicity of environmental toxicants such as risk assessment of ToxCast chemicals in early age children (Wetmore et al. 2014). Since both IVIVC and IVIVE models require more predictive in vitro data to better predict in vivo outcomes, there have several attempts made to incorporate metabolism competence in their in vitro assays and consider metabolic stability and metabolism-induced toxicity of drugs (Yoon et al. 2015; Yoon and Clewell 2016). The biotransformation of drugs can produce metabolites that have different toxicity profiles from their parent compounds (Combes et al. 2002). In particular, CYP450 reactions can generate metabolites that are more reactive and can induce toxicity through a variety of mechanisms (e.g., covalent binding to macromolecules or contributing to oxidative damage) (Costas 2008). Differences in individual responses to compounds are common among the human population and can be attributed to genetic variations that limit the expression or activity of certain DMEs (Astrid et al. 2007). Thus, determining which enzymes activate or deactivate a compound is essential to understand population variances in drug and drug candidate toxicity.

In recognition of these issues, we developed the new DataChip/MetaChip platform to incorporate 3D hepatic cell culture as well as metabolism competence with an array of DMEs, which in turn can decipher metabolism-induced compound toxicity. In our results, the DataChip/MetaChip identified 18 compounds whose reactions with Phase I and II DMEs resulted in IC50 values significantly lower than that of the parent compounds, indicating that these compounds were directly metabolized and activated by the Phase I and/or II DMEs on the chip. In addition, 11 compounds were detoxified by Phase II DMEs. For example, metabolism of sulindac resulted in increased IC50 in the All Mix relative to that with the No Enzyme control and P450 Mix, suggesting that this compound was transformed by Phase II DMEs. Similar results were obtained from compounds exposed to HLMs. By simply comparing human C max values and All Mix IC50 values from the chip at different cutoffs, we achieved 100% sensitivity, 86% specificity, and 93% overall predictivity. This outcome implies that our DataChip/MetaChip platform could provide high predictivity of human hepatotoxicity as compared to animal and in vitro 2D counterparts. In addition, our in vitro chip data might be used in IVIVC and IVIVE models to provide more predictive information on metabolism competence. In summary, the chip technology could be used at early stages of drug development, not to predict the extent and nature of all possible in vivo toxic effects, but rather to estimate the risk of failure if a new lead compound is transformed into metabolites that can be toxic to cells and potentially to humans.

Conclusions

Drugs react and form metabolites in the body via various metabolic pathways. Metabolites formed from Phase I and II DME reactions can cause ADRs, which may not be detected easily in animal models due difference in genetic makeups between animals and humans. Predictivity of in vivo hepatotoxicity of 22 test compounds obtained from the DataChip/MetaChip containing 3D-cultured Hep3B cells and Phase I and II DMEs demonstrated that the chip platform could provide a better correlation with in vivo human C max values compared to in vivo rat LD50 data. This result is presumably due to the in situ generation of compound metabolites on the MetaChip for more accurate assessment of metabolism-induced toxicity and the 3D culture environment on the DataChip that may better mimic the tissue architecture and enhance functionality of the hepatic cells. The DataChip/MetaChip platform could ultimately be tailored to accommodate an individual’s DME inventory and be used in conjunction with various high-content imaging assays to provide specific mechanisms of metabolic toxicity profiles in different populations of individuals as a component of broader precision medicine. With more in vitro IC50 data from the chip platform for further validation, the DataChip/MetaChip platform could represent a promising, high-throughput microscale alternative to conventional in vitro multi-well plate platforms and may create new opportunities for rapid and inexpensive assessment of human toxicology at very early phases of drug development.