FormalPara Key Points

Drug- or herb-induced liver injury (DILI/HILI) causality assessment should use a validated, structured, and quantitative method

Roussel Uclaf Causality Assessment Method (RUCAM) was the first liver-specific scoring system made up of defined key elements and validated with cases including positive rechallenge

Since 1993, RUCAM has been widely used to identify hepatotoxins in case series and to determine DILI/HILI incidence in pharmacoepidemiological studies

RUCAM provides transparent data, and its updated version helps reduce inter-observer variability, supported by the use of working instructions

1 Introduction

Drug-induced liver injury (DILI) and herb-induced liver injury (HILI) have received much attention [1,2,3,4,5,6,7,8] for the identification of risk factors such as drug lipophilicity [9, 10], high daily dosage [9], high hepatic metabolism [10], and HLA alleles [11, 12]. Although promising, these risk factors remain uncertain due to questionable data quality of the cases and the method used for assessing causality. For instance, using the global introspection method, also known as expert judgment or expert opinion [8], or a validated causality assessment method (CAM) such as RUCAM (Roussel Uclaf Causality Assessment Method) [6, 13,14,15] may have significant impact on the conclusions of studies. RUCAM was the first CAM specifically designed for liver injury [13], structured with defined key elements and scores attributed to each item, providing a final score for causality grading [6, 13]. To circumvent arbitrary adjudication, definitions and scores of RUCAM items have been derived from analysis of DILI cases with positive rechallenge recognized as the best diagnostic test [16]. Indeed, only cases with a probable or highly probable causal relationship based on RUCAM can provide a reliable description of the main features. However, outside RUCAM, the details of assessment are usually not presented in DILI/HILI case reports, making re-assessment by peers difficult, and could explain discrepant results.

Achieving the correct etiological diagnosis of a liver injury has a long history, with problems not confined to DILI [8, 13,14,15, 17, 18], being recently expanded to HILI and dietary supplements [19,20,21]. The latter would account for 12–20% of acute liver injuries due to xenobiotics, not only in China [22] but also in the US [23]. Indeed, the diagnosis of DILI and HILI was blurred by poor data quality [17,18,19,20,21], confounders such as alternative causes [5, 8, 11, 12, 19,20,21] that were not sought for, and unverified diagnoses [8, 18, 21]. In the absence of specific biomarkers and since DILI or HILI can mimic any liver disease, the diagnosis can err without the support of a structured causality assessment method.

In this article, we discuss the strengths and challenges of RUCAM and why it is still widely used 25 years after its launch.

2 Why RUCAM?

Causality assessment of DILI can be based on two different approaches: the global introspection method or a validated, structured, and quantitative method. In the global introspection, the assessor builds up an opinion based on his/her personal experience and general items but without standardized definitions and scores of the key elements to take into consideration. This method results in global and subjective conclusions difficult to share with other assessors, not only because the nature of the items is unclear but also because the weight of these items in the final opinion is unknown or arbitrary, fluctuating from one case to another. Conversely, with a validated, structured, and quantitative method, the assessor follows defined items and assigns scores to reach a final causality grading, enabling comparison of the results with those of other assessors. In the late 1980s, hepatologists from the US and Europe involved in DILI were convened to define various aspects of DILI in practice and establish qualitative criteria to assess causality [24, 25]. After significant changes including the addition of items, assignment of scores to each item, and the validation of the method, RUCAM was published in 1993 [13, 14]. RUCAM criteria were developed from a series of DILI cases, with positive rechallenge recognized as gold standard [16] to confirm the diagnosis. Prerequisite criteria were defined: (i) a liver injury, (ii) the liver injury pattern, and (iii) key elements [25]. The items were individually scored and included in RUCAM [6, 13]. Finally, RUCAM was validated first by using the cases with positive rechallenge to determine the performance indicators (sensitivity, specificity, positive and negative predictive values) and second by external independent assessors using consecutive DILI cases to determine the reproducibility of the method [13].

RUCAM gradually became a cornerstone of DILI and HILI case evaluation and has received worldwide appreciation for over 25 years [1,2,3,4, 21, 22] that has not been shared by any other CAMs subsequently published, as previously described [6]. In addition to these encouraging aspects, the main characteristics and lessons learned from RUCAM use merit presentation.

3 RUCAM Characteristics and Lessons Learned from Its Use

3.1 Standardization

RUCAM is the first standardized CAM specific to liver injury for assessing causality from onset to the end of the course of DILI and HILI case evaluation, characterized by seven well defined and scored key elements, the sum of which provides a final score with causality grading (See Electronic Supplementary Material, Tables 1 and 2). As a tool, RUCAM also provides working instructions to users (See Electronic Supplementary Material 1) to ensure a transparent scoring system. Despite the inter-observer variability of RUCAM, but also of the expert judgment [26], a large number of studies keep using RUCAM to classify the cases, describe the clinical features, and calculate incidences of DILI and HILI [6, 15]. The working instructions reduce ambiguities and the risk of inter-rater variability as opposed to the method based on expert judgment rounds where there is no set of working instructions [8].

3.2 Pre-Requisite Criteria

3.2.1 Liver Injury

RUCAM was the first CAM that required criteria for a liver injury based on liver test (LT) thresholds [13, 14]. Current definitions include serum activity of alanine aminotransferase (ALT) of at least five times the upper limit of normal (ULN) and/or hepatic alkaline phosphatase (ALP) of at least 2 × ULN [6]. Below these thresholds, the cases are not clinically relevant for causality assessment and might reflect unspecific background noise, liver diseases like nonalcoholic fatty liver disease, or merely liver adaptation to metabolism of synthetic or plant chemicals.

3.2.2 Liver Injury Pattern

RUCAM was also the first CAM that recognized the importance of the three types of liver injury defined by the consensus meeting [25]: hepatocellular, cholestatic, and mixed liver injury according to the ratio R, universally recognized as a discriminant tool [6, 22, 23, 27]. The ratio R should be calculated at the beginning of the liver injury as the hepatocellular type could evolve over time towards a cholestatic/mixed type that would change the criteria for causality assessment [6]. In practice, two types of liver injury are considered for evaluation: hepatocellular injury and cholestatic/mixed liver injury [6, 13], as they have different risk factors and time courses of ALT and ALP (See Electronic Supplementary Material, Tables 1 and 2).

3.3 Key Elements

Individual RUCAM key elements had been derived from a series of DILI cases with positive rechallenge [13, 14].

3.3.1 Timing of Events

Chronological criteria were defined with a time frame between the beginning or the discontinuation of the drug/herb use and the onset of increased relevant liver tests (ALT for hepatocellular injury or ALP for the other types) or symptoms related to the liver injury. Chemicals with prolonged half-lives are also taken into consideration in this item.

3.3.2 Dechallenge

Dechallenge criteria reflect the course of ALT or ALP after cessation of the suspect drug/herb and are cornerstones of RUCAM. Treatment during the dechallenge phase with drugs such as steroids or ursodesoxycholic acid may mask the natural course and allows only a score of 0. Serial ALT testing on days 8 and 30 or ALP testing on day 180 after cessation of the suspect drug/herb ensures data completeness in hepatocellular injury or cholestatic/mixed liver injury, respectively. Other variations of the relevant enzyme are considered and scored [6].

3.3.3 Risk Factors

In the consensus meeting, alcohol use was considered as a risk factor [25]. In the 2016 update, thresholds for current alcohol use were specified for women (two drinks/day) and men (three drinks/day) in RUCAM [6]. Experts also considered pregnancy as a risk factor [25], but only for cholestatic/mixed liver injury [6, 13] due to the powerful cholestatic effect of estrogens. Despite the controversial results of studies, age was considered as a possible risk factor by the experts and included with a threshold of ≥ 55 years in RUCAM [6, 13].

3.3.4 Co-Medication(s)

Concomitant use of drugs, herbs or dietary supplements is a crucial item that is best detected at first presentation. In the consensus meeting, the experts agreed to include co-medications in the causality assessment process [25]. In RUCAM, this item was singled out and scored according to the timing of administration and the known hepatotoxicity of the co-medication [6, 13]. Each co-medication requires a separate analysis with RUCAM [32]. In case of multiple drugs or herbs, the causality should be attributed primarily to the drug or herb with the highest final score [33, 34]. Finally, drug and herb interactions or combination products can also be identified with RUCAM by assessing the suspected pair of drugs/herbs as a single product.

3.3.5 Search for Alternative Causes

RUCAM requires a search for the most relevant and frequent alternative causes (Group I), and less frequent causes and complications of underlying disease(s) (Group II). Viral and auto-antibodies are so important that the list of biomarkers was completed in the RUCAM update with Hepatitis E Virus (HEV)-specific markers in Group I [6]. In practice, a list of differential diagnoses is proposed and these need to be considered on a case-by-case basis depending on the clinical context, the benefit for the patient and financial resources (See Electronic Supplementary Material Table 3). RUCAM may facilitate distinction between DILI and flares of pre-existing liver diseases [35, 36]. For viral infections, titer changes of specific antibodies have to be evaluated in the clinical course to confirm or exclude an ongoing viral infection [6, 35]. Drug-induced autoimmune hepatitis is another challenge but compared with the first episode or a flare-up of autoimmune hepatitis there is no story of liver disease, no or mild hypergammaglobulinemia, no or very mild fibrosis, immediate and effective response to corticosteroids and no relapse after stopping the corrective treatment. However, in rare cases, complications of underlying liver diseases cannot be identified with certainty, such as alcoholic liver disease for which specific laboratory tests are not available but where the combination of signs, symptoms and biochemical tests help to confirm the possible causes. In addition, due to the high prevalence of overweight and obesity in the general population, the increase in liver enzymes, usually ALT below 5 × ULN, could be wrongly ascribed to a drug while nonalcoholic fatty liver (NAFLD) and one of the most common complications, nonalcoholic steatohepatitis (NASH), would be the main causes of liver test abnormalities as suggested by the results of hepatic ultrasonography. Also mentioned in the definitions, ALT above 5 × ULN defines the liver injury to take into consideration for assessing causality.

3.3.6 Known Drug/Herb Hepatotoxicity

Hepatotoxicity of the suspected drug/herb listed in the product information sheet (e.g. summary of product characteristics in the EU or product information in the US) must be checked. If it is not mentioned, a literature search in PubMed is recommended to determine whether the product has already been involved in DILI or HILI and ideally, with RUCAM-based high causality degrees. However, not all published DILI or HILI cases are in fact causally related due to missing data [21, 36, 37, 39], ignored alternative causes or underlying diseases confounding the diagnosis [17,18,19,20,21, 39].

3.3.7 Response to Unintentional Rechallenge

RUCAM was the first CAM with defined criteria for positive and negative drug/herb rechallenge tests specifically for DILI [14]. Positive rechallenge is viewed as a hallmark and gold standard in causality assessment in general [16, 28, 29] and particularly in DILI case evaluation [30]. Conversely, a negative rechallenge does not mean that the drug did not play a role in the DILI case. Indeed, on rechallenge, the dose readministered, the treatment duration, the co-medications or even the liver adaptation to drug toxicity could influence the response and therefore cause a different reaction to drug re-exposure [40]. This is the reason why the score of a negative rechallenge (− 2) is not symmetrical to that of a positive rechallenge (+ 3). The criteria, based on the DILI cohort served to validate RUCAM and strictly defined, were included in the key elements of RUCAM [13], in line with the conclusions of Consensus Meetings [24, 25], as previously reviewed [13, 14] and recently highlighted [19, 21, 31].

Unintentional rechallenge should meet strict conditions to be interpretable [6]. In order to facilitate this item handling, the table of conditions and interpretation of the responses to rechallenge was updated in 2016 (See Electronic Supplementary Material, Table 4). Rarely, rechallenge provides positive response due to poor data quality, retrospective analysis of the cases [19, 22, 38] and ethical concerns due to a high risk of severe hepatic reaction [13, 40]. Consequently, a rechallenge score exceptionally contributes to the final score and is not an obligatory element of RUCAM.

3.4 Scoring System

With a total of − 9 to + 14 theoretical points, the final score indicates causality degrees: ≤ 0 excludes causality; 1–2, unlikely; 3–5, possible; 6–8, probable; ≥ 9, highly probable. Missing data is a known problem with DILI cases [8, 27, 39], appropriately addressed in RUCAM [6, 13] by a null score for the concerned item.

3.5 Validation and Reproducibility

RUCAM has been tested for accuracy, reproducibility and inter-observer variability and has performed well showing high sensitivity (86%) and specificity (89%), with high positive (93%) and negative (78%) predictive values, based on 77 case reports with positive rechallenge (49 cases and 28 noncases) [14]. Reproducibility results were good: four external assessors independently evaluated 50 DILI cases (average 2 products/case).Very low inter-observer variability was found with no disagreement in 84% of cases when they were assessed in 5° [13]. Good results were found by another team [41], but high variability was shown not only with RUCAM but also with the global introspection method [26], raising the question as to how items were handled. RUCAM working instructions are available to reduce variability (See Electronic Supplementary Material 1).

3.6 Real-Time Assessment

One of the RUCAM strengths is that the cases can be assessed prospectively, as soon as a DILI or HILI is suspected, to collect the relevant data in a timely manner [6, 15, 32]. As an example, a recent DILI series from India [4] was highly appreciated as a report of excellence [42]. Unfortunately, RUCAM is often used late after the onset of the liver injury, reducing the chance to detect new hepatotoxins and increasing inter-rater variability. Nevertheless, retrospective but careful RUCAM-based analyses of well documented DILI and HILI cases can provide high causality degrees [5, 43, 44].

3.7 Transparency

To be reliable, causality assessment needs to be transparent. For each suspected drug/herb, the data elements should be listed with their scores. This is easily achievable with RUCAM as shown in a HILI case [32] and other reports that can be re-assessed by peers or regulatory agencies [33, 34, 45, 46].

3.8 Products (Drugs, Herbs and Dietary Supplements)

RUCAM was used in reports involving drugs [4,5,6,7, 13,14,15, 18, 30, 41, 42, 44, 47], herbs [31, 46] including herbs of TCM (Traditional Chinese Medicine) [5, 31, 43, 48] or Indian Ayurveda herbs [32], and dietary supplements [21, 31, 33, 34, 45]. Causality was often established [4,5,6, 15, 32, 46, 47], excluded [21, 33, 34, 45, 46], or subject to debate [49], confirming the need to provide the detailed results with RUCAM. Interestingly, when different product types are taken concomitantly, RUCAM allows identification of the most likely offending product [32,33,34, 45].

3.9 Studies

Many studies used RUCAM to detect clinical hepatotoxicity of drugs in regulatory evaluations, clinical studies, epidemiological studies, genotyping studies, case reports and case series, referenced in [6, 15]. Likewise, RUCAM can also be used in phase I/II/III clinical trials (Table 1) to detect hepatotoxicity of the new compounds as early as possible.

Table 1 Answers to frequent comments on RUCAM

3.10 Global Usage

With the exception of the US, where the DILI network (DILIN) applies global introspection with several rounds between assessors and limited to this country [8, 39], the worldwide usage of RUCAM is confirmed in several studies [1, 4, 5, 21, 23, 30, 43]. Because liver centers can gather only a small number of cases, DILI registries were established across several countries, aiming at studying clinical features of DILI cases by using robust CAMs such as RUCAM. This includes countries such as Sweden [41], Spain [44], Iceland [47], Serbia [53, 54] and Latin America [55]. It is important for Public Health to facilitate the decision-making process on suspected hepatotoxins by regulatory agencies and therefore to maintain an internationally harmonized causality assessment approach such as RUCAM. It is also critical for editors of scientific journals to rely on an objective approach to causality assessment when making the decision to publish studies on DILI and HILI. Published data across countries and registries can be harmonized, easily interpreted and compared [44, 46, 55,56,57]. Moreover, RUCAM can identify DILI and HILI cases early in clinical development enabling companies and regulatory agencies to propose measures to minimize the risk of severe hepatic reactions.

4 Alternative Approaches of Causality Assessment

Following RUCAM publication, other CAMs incorporated some RUCAM elements and their scores [6, 15], but due to shortcomings none were recommended for use [15]. The global introspection method used by DILIN [8, 39, 58] considers some RUCAM items but without a formal algorithm. This results in a subjective causality grading expressed as arbitrary percentage ranges and leaves questions as to how key elements and missing data were taken into consideration. In case of several suspected products, the global introspection does not specify the reasons for which one product is the most likely cause [50, 51]. An important feature is that the DILIN method results in higher causality levels as compared with RUCAM [26], which would lead to over-reporting of DILI. This could also impact the reliability of the NIH LiverTox website [39, 52]. Finally, due to the absence of item definition and scores in the global introspection method, it is not easy or even possible to re-assess the cases independently. Descriptions and shortcomings of all CAMs have been discussed elsewhere in detail [6, 15, 49].

RUCAM was designed to be a user-friendly method with a simple form and recommendations to users [6, 13] (See Electronic Supplementary Material 1, Tables 1 and 2). Case management with RUCAM is quick, effective and cost saving, as no network and no rounds are needed. RUCAM cannot compensate for poor quality data in medical records [33, 34, 44, 45]. The problem of missing data in case reports, not specific to DILI, will remain unless steps are taken to improve case documentation on an ongoing basis, as illustrated in several examples [6, 37, 45, 46]. Table 1 provides answers to challenges and comments frequently raised by RUCAM users. In addition, suggestions made to include potential risk factors such as ethnicity, gender, diabetes mellitus, metabolic syndrome or body mass index were not followed because these elements are not validated, although epidemiological studies showed weak association with some of them.

5 Biomarkers

No valid diagnostic or prognostic biomarker currently exists for idiosyncratic DILI or HILI, and several studies failed to show good performance indicators for candidates [7, 48, 58]. The main reasons would be that idiosyncratic DILI is (i) typically a human disease hardly reproducible in animals and (ii) DILI cases used for testing the new biomarkers are not correctly assessed for causality, which would substantially decrease the power of the tested biomarker. Here also, RUCAM-based assessment will ensure homogeneity of cases tested with the new biomarker.

6 DILI signatures

Drugs that tend to cause DILI have a characteristic clinical and biochemical presentation or ‘signature’. This profile cannot be recognized in early clinical trials but is established for some drugs and easily found in the LiverTox website [59]. For instance, amoxi-clav-induced liver injury appears many days after discontinuing treatment and tends to be hepatocellular in the young but mixed in the elderly. These signatures may help in causality assessment, particularly when there are multiple suspect drugs with similar start and stop dates, but diagnostic tools do not include such profiles in the assessment. In RUCAM, this point is taken into account in the item “Known hepatotoxicity of the product”. One can also consider that assigning additional points to signatures would give high scores to known hepatotoxins and prevent detection of new hepatotoxins. Maybe in the future these signatures will need to be specifically and quantitatively incorporated into causality assessment tools, such as RUCAM, along with genetic risk factors and biomarkers. However, this new RUCAM would require a new validation with data using DILI cases with positive rechallenge [14].

7 Conclusions

After 25 years of RUCAM use, the strengths and the challenges of the method are clear. Firstly, the worldwide use of this structured and validated method by hepatologists, epidemiologists, and clinicians working in research or in daily practice shows its robustness and the confidence in its results. Secondly, RUCAM should be used prospectively for timely collection of the relevant data. Thirdly, there is a need to search actively for biomarkers to ease the diagnosis of DILI and complement RUCAM. Fourthly, with the wide use of RUCAM, including an electronic version [60], its performance is expected to be improved by adding, modifying, or deleting elements or changing scores but the validation of the new version should follow the approach taken in the original version [14]. Finally, despite criticisms, RUCAM remains the main reference for causality assessment methods when a drug/herb-induced liver injury is suspected. The future will certainly bring solutions with artificial intelligence applied to complex expert systems that are expanding throughout all areas of medicine.