Introduction

One of the key objectives of Phase I clinical trials is to assess the safety of new chemical entities in humans and, in particular, to document the absence of hepatotoxicity. Subjects in Phase I trials are typically healthy volunteers selected to be free of liver disease. Transaminase, bilirubin and alkaline phosphatase plasma levels are routinely used to detect signals of drug-induced liver injury. In the analysis of such data at least two explanations of observed changes need to be considered: physiologically based regulation of enzyme and transporter levels and true injury to liver cells. Standardization of conditions in Phase I trials with respect to lifestyle and meals (e.g. no alcohol) are also expected to reduce the incidence of changes in transaminase levels. Despite this standardization, elevation in transaminase to levels above the upper limit of normal (ULN) does occur in placebo-treated healthy volunteers in Phase I trials [14].

Although the cause of transaminase elevation in placebo-treated healthy volunteers during Phase I trials remains unclear, several hypotheses have been generated, including an imbalance of energy (increase of food intake, lack of physical activities) [35], obesity [6, 7] and a high sucrose intake leading to readily available fructose, all of which may upregulate alanine transaminase (ALT) [8, 9]. However, since lifestyle and food intake varies considerably among individuals, the standardization of these factors in Phase I trials may paradoxically cause transaminase elevation. In fact, hospitalization has been found to be associated with transaminase elevation [1, 4]. In addition, it is not clear whether all subjects with chronic and stable liver disease are in actual fact excluded from typical Phase I clinical trials simply by measuring transaminase levels once during screening, and their ALT levels can fluctuate and contribute to incidence of transaminase elevation during placebo treatment.

Transaminase elevation in the placebo-treated group, especially when ALT values greater than 2× ULN or 3× ULN are present [2], would make it difficult to interpret transaminase elevation in subjects treated by the investigational drug in the same trial. Thus, minimizing transaminase elevation in the placebo arm would help differentiate drug-related from non-drug related liver signals. To achieve this, we set out to investigate what the expected/normal variation in transaminase is in healthy volunteers at baseline and on placebo, and which factors may cause transaminase elevation on placebo. Using data from placebo arms of 20 historical Phase I trials, we found that the baseline ALT level and the change from baseline are important factors that need to be examined when interpreting data on transaminase elevation.

Materials and methods

We collected data on 481 individual placebo-treated healthy volunteers from 20 historical AstraZeneca Phase I trials across multiple therapeutic areas (with the exception of oncology). Of the 20 trials, 12 were conducted at one of the four AstraZeneca Clinical Pharmacology Units (CPUs), and the remaining eight were conducted by nine different Contract Research Organizations (CROs). For each of the 20 historical Phase I trials, informed consent in writing was obtained from each patient, and the study protocol conformed to the ethical guidelines of the 1975 Declaration of Helsinki and local laws, as reflected in a priori approval by the ethics committee.

Data were standardized and integrated in an Oracle database based on the Study Data Tabulation Model (SDTM) from Clinical Data Interchange Standards Consortium (CDISC) [10]. Data types collected from these studies included all ALT measurement values, patient demographics and study design information. The 20 trials that were included in our analyses had different study designs, ranging from single dose to multiple dose studies, and from parallel group to crossover design. All of the 20 trials were healthy volunteer studies and, therefore, excluded subjects with any clinically significant illness, including a suspected/manifested infection (e.g. human immunodeficiency virus and hepatitis). Subjects in our Phase I trials were usually required to reside at study sites for a number of dosing periods separated by days of non-residency, with the length of residency of each dosing period ranging from hours to 1–2 days. The accumulative length of residency ranged from hours to 16 days, with ten trials requiring total stays shorter than 1 week, and the other ten trails requiring stays longer than 1 week. Ethnic origins of study subjects include African, Asian, Caucasian and other (e.g. an origin other than the three specified). Although there were a couple of trials conducted in elderly or Asian populations, most study populations were predominantly young Caucasian males (Table 1). Only the data from placebo arms in these 20 studies were used in our analyses. The placebo used in each trial contained ingredients that matched the vehicle of the investigational drug solution or tablet and was also identical in appearance to the investigational drug.

Table 1 Demographic information of study population

The initial analysis used mainly data visualization and exploration through Spotfire DecisionSite [11] to identify: (1) the expected/normal variation in ALT when healthy volunteers come into a Phase I trial, and (2) any patterns in variations of ALT values across countries, study sites and laboratories or over time. Two analysis approaches were applied to identify patterns in ALT elevations: (1) examination of ALT elevation rates using the ULN as the cutoff; (2) examination of changes from baseline in actual ALT values.

Based on the patterns identified, certain hypotheses were generated, and further analyses were carried out to test these hypotheses using traditional statistical methods [e.g. t test, analysis of variance (ANOVA), linear regression] and Random Forests. Random Forests [12] is a multivariate data mining method derived from decision tree analysis that involves growing many trees (forests) and letting the forest choose the classification having the most votes (over all of the trees in the forest). Because the Phase I study populations comprised predominantly young Caucasian males (Table 1), precautions were taken in interpreting our results on the effects of age, race and gender on ALT variations.

Results

We used data from placebo arms in 20 historical Phase I trials to determine ALT value distribution and ALT elevation rates, examine baseline ALT levels as well as changes from baseline and identify the patterns in ALT variation across different studies, study sites and demographic groups and over time.

Variations in ALT baseline values

We pooled ALT baseline values, which were measured before the placebo treatment was initiated, in 481 subjects from 20 historical Phase I trials. The distribution of baseline ALT values did not follow a normal bell-shaped distribution. Instead, baseline ALT values had a skewed distribution characterized by a long “tail” at the high end of the scale, with a mean value of 22.2 IU/L, a standard deviation of 10.84 IU/L, and a skewness of 1.78. This skewed distribution is consistent with what has been reported previously for typical populations [13, 14] and blood donor populations [15]. As the distribution of ALT was skewed and log10 (ALT) had a normal distribution (skewness = 0.08), we used the log10 (ALT) for all subsequent analyses.

Baseline ALT values varied with gender and race (p < 0.05, two-way ANOVA). Specifically, ALT levels in females were lower than those in males, and lower in Africans and Asians than in Caucasians and other races. The data also indicated that the ALT level increased with higher body mass index (BMI; p < 0.05), but decreased with age (p > 0.05). The variations in ALT values with gender, BMI and age are consistent with results reported in the literature [1620], but the association between race and ALT level is not consistent with published results [1619] and should not be taken as conclusive, considering the unbalanced race composition (322 Caucasians, 51 Africans, 31 Asians and 77 others) in this study compared to populations used in epidemiological studies.

Variations in ALT normal reference ranges and elevation rates

The ULN of ALT varied considerably across different studies, study sites and sometimes even within the same study and study site (Fig. 1). They ranged from 32 to 72 IU/L, with an average value of 45.5 IU/L and standard deviation (SD) of 9.1 IU/L.

Fig. 1
figure 1

Variation in the upper limits of normal (ULN) values of alanine transaminase (ALT) across different studies and study sites. The different shapes on the graph represent different study sites. Studies are not presented in any particular order on the X-axis, and study identifiers are only partially shown

The ALT elevation rates in the placebo group (per study) varied over time (p < 0.05, by linear regression), with higher rates occurring in studies conducted after 2001 (Fig. 2a). This increase cannot be attributed to differences in elevation rates at baseline, since there was no increase over time in baseline elevation rates in the same trials (data not shown). One possible explanation for the observed increased ALT elevation rates over time may be increased variation in the ULN observed over time (Fig. 2b). Using pooled data from the 20 studies, we found that the overall ALT elevation rates were 3.1% at baseline and 4.4% on placebo and that these two rates were not significantly different. There were only two cases (0.4%) of elevations above 2× ULN, and none were above 3× ULN. As shown in Table 2, the overall ALT elevation rate on placebo in our pooled data from 20 historical trials was 4.4%, which is lower than those previously reported for hospitalization trials [1, 2] but higher than those reported for ambulatory trials [1].

Fig. 2
figure 2

The variations over time in ALT elevation rate. a Variations over time in ALT elevation rate on placebo across different trials, in which each data point stands for one trial. Elevation rate is the proportion of subjects on placebo who manifested an elevation in ALT level during placebo treatment; it is shown for each trial. b Variations over time in ULN of ALT, in which each data point represents a trial or a subset of a trial where a distinct ULN of ALT was used. The date of the first subject enrolled was used to represent the time when a trial was conducted in both a and b

Table 2 Overall alanine transaminase elevation rate in the present and previous publications [1, 2]

Changes in ALT level from baseline

Regardless of differences in study design and study duration, ALT elevations were found well after 1 week into the study (Table 3), with the majority of ALT elevations actually appearing after 2 weeks on placebo treatment. This is consistent with results from previous retrospective analyses [2] of 13 Phase I trials in which subjects with baseline ALT abnormalities contributed about 25% of the cases of subsequent ALT elevation during placebo treatment. Similarly, in the current analysis, of the 21 subjects with elevated ALT levels (i.e. >1× ULN) during placebo treatment, six (29%) actually had elevated ALT levels at the baseline visit. As shown in Fig. 3a, for five subjects with baseline ALT abnormality, repetitive ALT measurements were performed to “correct” ALT to normal levels at the start of the placebo treatment (day 1). However, ALT levels elevated (>1× ULN) again during placebo treatment in two of the four “corrected” subjects.

Table 3 ALT elevation rates during different time periods of the trials
Fig. 3
figure 3

Individual ALT changes from baseline. a ALT time courses for five subjects who had abnormal ALT values before or at randomization (study day 0). Triangles ALT values >ULN, solid circles normal values. There are three different levels of ULN (40, 48, and 66 IU/L, respectively, as determined during the first visit) because the patients come from three different trials. b A box plot for comparison between baseline values and maximal values on placebo for all subjects. The box shows the range from the lower to the upper quartile with an arrowhead indicating the mean and a gray area for the 95% confidence interval. The data points show the outliers, while whiskers show smallest and largest non-outliers

The primary causal factor for baseline elevations was obviously due to the enrollment of subjects with abnormal baseline ALT, as only two out of the 20 studies had clear exclusion criteria in the liver function test and excluded subjects with abnormal baseline ALT levels using the ULN as the cutoff. In actual fact, 14 of the 20 trials studied here enrolled subjects with abnormal baseline ALT, although subjects with a known history of liver disease were usually excluded.

Although the overall ALT elevation rates at baseline and during placebo treatment are not significantly different, the maximal ALT values during placebo treatment were significantly higher than the baseline levels (p < 0.05 by paired t test, Fig. 3b). We therefore defined ALT change from baseline as the difference between maximal ALT and baseline levels in order to carry out a comparison across different demographic groups or studies. We found that ALT change from baseline varied with race and country (both p < 0.05, by ANOVA test; Fig. 4).

Fig. 4
figure 4

Variations in ALT change from baseline, elevation rate, and ULN across genders (a–c), races (d–g) and countries (g–I). In the box plots, the box shows the range from the lower quartile to the upper quartile with an arrowhead indicating the mean and a gray area for the 95% confidence interval. The whiskers show smallest and largest non-outlier observations, and the data points show the outliers. ALT changes from baseline are represented by the logarithm of change from baseline

Variations in ALT change from baseline appear to be different from variations in elevation rates across demographic groups. Across gender, females had a higher magnitude of ALT change (Fig. 4a), but a lower ALT elevation rate (Fig. 4b). Across ethnic groups, Caucasians had the lowest magnitude of ALT change (Fig. 4d) but the highest ALT elevation rate (Fig. 4e). Across different countries, the magnitude of ALT change was highest in USA (Fig. 4g), while this country had the lowest elevation rate (Fig. 4h). All of these discrepancies can be partially explained by the differences in ULN, across gender (Fig. 4c), races (Fig. 4f) and countries (Fig. 4i), respectively.

Multivariate analysis of all factors

The data exploration and statistical analysis performed in this study demonstrate that several factors influence ALT elevation rates and changes from baseline in placebo-treated healthy volunteers, including baseline ALT levels, a number of demographic factors and some study design factors. To be able to evaluate if there is any combinatorial effect from these factors, we used the Random Forest method to examine the overall ability of these factors to predict ALT elevations during placebo treatment and to determine how important each of these factors is for the prediction.

Using baseline ALT level in combination with all demographic and study factors, Random Forests classification (Fig. 5a, b) predicted whether individual subjects would develop ALT elevations (>1× ULN) on placebo, with an accuracy of 80% and sensitivity and specificity of 80%. Using the same variables, Random Forest regression (Fig. 5c) explained 51% of the variation in the maximal ALT value of subjects on placebo, with a root mean of squared residuals of 0.14 log (IU/L). In both the classification and regression, the most important variable for predictions is the baseline ALT level, followed by study site and a number of demographic factors (Fig. 5b, c). This further suggests that the exclusion of subjects with liver test abnormalities at baseline would further reduce non-drug-related liver signals in Phase I trials in addition to excluding those with liver disease.

Fig. 5
figure 5

Variable importance generated from Random Forests (RF) models. a Receiver operating curve (ROC) curve of the RF classification model that predicted ALT elevation >ULN with an 80% accuracy, generated by setting different voting percentage cutoffs for choosing the classification over all of the decision trees in the forest. The area under the curve (AUC) is 0.80, and the best cutoff gave both a sensitivity and specificity of 80%. b Variable importance plots generated from the RF classification in a. c Variable importance plot of the RF regression models that predicted maximal ALT value with 51% of its variation explained. d Variable importance plot of the RF regression models that predicted ALT change from baseline with only 10% of its variation explained. Variable importance was calculated by either a mean decrease in the gini impurity score (a) or an increase in node purity (b, c) [12]. In all variable importance plots, Study site denotes the individual study sites, Study site type is the type of study site (either internal Clinical Pharmaceutical Units or Contract Research Organizations), Days of residency refers to the accumulative length of residency at the study site in number of days and Stay length is the categorized length of residency using 1 week as the cutoff (i.e. <1 week, 1 week or >1 week)

We also tried to predict ALT change from baseline (i.e. log ratio of maximal ALT and baseline ALT levels) using baseline ALT level in combination with all demographic and study design factors. However, the Random Forest regression only explained 10% of the variation in change from baseline of subjects on placebo, with root mean of squared residuals of 0.14 log (IU/L). Still, the most important variable for predictions is the baseline ALT level, followed by study site and a number of demographic factors (Fig. 5d).

Discussion

The level of alanine transaminase is a standard laboratory parameter for the evaluation and follow-up of liver diseases and hepatocellular damage, but the ULN of ALT has been shown to vary across different laboratories according to the commercial kit used and the reference population chosen by each manufacturer to establish the normal range [16, 21]. Another contributing factor to this variation may be the different approaches used to calculate ULN. For example, both a central 95% confidence interval (2.5–97.5%) [22, 23] and a one-sided 95% confidence interval (0–95%) [20, 21] have been used to define ULN. Moreover, even if ULN is defined using the same statistical measure across laboratories, the ULN may not be up-to-date to reflect the normal range in the healthy local population. The currently accepted range of normal values for serum ALT levels has been recently challenged by research groups [16, 20, 21] who claimed that current standards for “normal” ALT level were defined by using populations that included persons with subclinical liver disease and that the ULN for ALT in the healthy population is significantly lower than that currently listed by the manufacturer of the biochemical test for ALT.

Consistently, we found a high variability in the ULN from our historical Phase I trials, which were conducted in different countries and used different laboratories for the ALT measurements. This high variability may originate from: (1) the different ULNs defined by different commercial kits, (2) the different ULNs derived from different local populations and (3) the different ways used to define ULN, as described above. We also observed that variability in ULN increased over time (Fig. 2b), which may reflect changes in ALT levels in some populations due to diet and life style changes [8, 9, 24, 25] and, possibly, variation across laboratories in their frequency of updating ULN. Our findings together with what has been reported in the literature on the high variability of ULN, as described above, suggest that normalizing transaminase levels to ULN (e.g. using multiples of ULN) may not be a reliable approach for comparing results across different laboratories.

In the absence of a both reliable and consistent ULN for the healthy population, subjects with subclinical liver disease could be enrolled in typical Phase I clinical trials. For example, non-alcoholic fatty liver disease (NAFLD) is the most common cause of ALT elevation in otherwise healthy subjects [26, 27], but 79% of subjects with hepatic steatosis were found to have normal ALT levels based on a one-time measurement [28]. Patients with NAFLD in whom ALT levels are normal (according to the current ULN) at the time of screening could be enrolled in our Phase I trials as “healthy” subjects. Such “healthy” subjects could then have higher than normal ALT levels on placebo and have contributed to the ALT elevations observed in our analysis. A one- time ALT measurement is obviously not sufficient to establish a baseline for stringent screening purposes and could lead to the inclusion of subjects with subclinical liver disease. In addition, in most asymptomatic individuals with high transaminase levels, the elevation is intermittent, and normal levels can be found when testing is repeated within months [29, 30]. This is consistent with our finding that second measurements during screening yielded normal ALT values in some subjects whose initial ALT level was higher than normal. However, half of these “corrected” subjects later developed higher than normal ALT levels on placebo. All of these findings suggest that multiple measurements better define the individual’s baseline level of ALT [22], and we believe that subjects with one measurement above ULN during screening should be excluded from Phase I trials.

We believe that a high baseline ALT level is a risk factor for ALT elevation on placebo, based on the following findings: (1) ALT elevation rates on placebo were correlated with baseline ALT elevation rates, (2) among placebo-treated subjects, 3.1% had abnormal ALT values at baseline, and these contributed 29% of all patients who had ALT elevation on placebo and (3) in multivariate analysis, baseline ALT level was found to be the most important factor in determining ALT elevations on placebo. These findings suggest that ensuring study protocols have clear inclusion and exclusion criteria using baseline ALT levels (e.g. excluding subjects with liver test abnormalities at baseline in addition to excluding those with diagnosed liver disease) could be an easy and effective approach to lowering the incidence of ALT elevations in Phase I trials and avoiding any confusion between non-drug-elated and drug-related liver signals.

The finding that baseline ALT was not a strong predictor of ALT change from baseline (Fig. 5c) suggests that a change from baseline primarily depends on environmental factors rather than predisposition, as measured by ALT levels. It also, indirectly, suggests that the predictive strength of ALT with respect to an increase above the ULN reflects the general observation that the closer a series of randomly fluctuating values is to a given fixed limit, the larger the probability is that one value will be found on the opposite side of the limit.

It has been proposed that changes from baseline in actual ALT levels better reflects changes in liver pathology and physiology than elevation rates above ULN. Our finding that ALT maximal levels during placebo treatment were significantly higher than ALT baseline levels, while elevation rates at baseline and during placebo treatment did not differ significantly, suggests that measuring change from baseline may be a more sensitive way to detect liver signals than the use of elevation rates. Based on this finding and our observation that across demographic groups, changes from baseline had almost opposite patterns to ALT elevation rates, plus the high variability in ULN and the importance of baseline level for predicting later elevation, we advocate looking at baseline and change from baseline in addition to looking at ULN-based change (i.e. multiples of ULN) for examining transaminase elevation. We believe that an approach that examines baseline and change from baseline would provide a more quantitative and individualized measure of ALT elevation and that such an approach could be important in terms of interpreting liver signals and making decisions on individual subjects in clinical trials, including liver test results from drug-treated arms in Phase I trials and, possibly, trials in later phases of clinical drug development. One remaining problem is that change from baseline has not been used often in the past, and it is therefore not easy to provide stopping rules based on it.

The importance of looking at baseline and change from baseline also remains in interpreting liver function tests in routine clinical practice. In the absence of previous test results and the opportunity to rule out other factors (e.g. diet, exercise, alcohol, drug), it could be premature to diagnose any abnormality in liver function based on only a one-time transaminase elevation above the normal range. On the other hand, an annual normal transaminase result could easily miss some abnormal liver conditions, such as NAFLD. Actually, most cases of transaminase elevations in seemingly healthy subjects have generally been attributed to NAFLD [25, 31], and liver biopsy studies of patients referred for transaminase elevations have demonstrated liver steatosis as the most common histological finding [27, 32, 33].

In addition to baseline levels and existing liver conditions, hospitalization may have some effects on ALT values [14], which could contribute to ALT elevations during placebo treatment in subjects who had normal baseline ALT values. Consistent with this hypothesis, the overall ALT elevation rate on placebo in our pooled data was lower than those previously reported for hospitalization trials but were higher than those reported for ambulatory trials. Subjects in our Phase I trials were usually required to reside at study sites for a number of periods separated by days of non-residency, with each period of residency ranging from hours to 1–2 days. Therefore, the suspected hospitalization effect associated with ALT elevations was intermediate, and elevation rates were also intermediate in our Phase I trials, compared to hospitalization trials and ambulatory trials. The hospitalization effect could well be explained by an excess caloric intake in some subjects as an effect of standardized meals and a reduced physical activity. It is worth noting this observation, which suggests that researchers should aim towards maintaining individually normal level of physical activity and food intake to the extent compatible with the Phase I trial.

Based on the findings of our analysis, we conclude the following: (1) ULN is not a very reliable means to define normal and abnormal transaminase levels, (2) the exclusion of subjects with baseline liver test abnormalities could further decrease transaminase elevation rates on placebo in Phase I trials, (3) baseline level and changes from baseline in transaminase are important variables to examine and should be used in addition to elevation above ULN for more reliably interpreting liver signals in Phase I clinical trials and (4) further studies are necessary to define what is a true baseline and what is a clinically significant change from baseline.