Introduction

Hepatic encephalopathy (HE) is cerebral dysfunction caused by liver failure and it inflicts 30-40% of patients with liver cirrhosis during their disease course (Vilstrup et al. 2014). In the most severe cases, HE results in a precipitated, often reversible, stuporous, or comatose state that severely impacts the patient’s autonomy and requires hospital admission. Clinically manifest HE is often preceded by minimal HE (MHE), a condition of cognitive impairment without other clinical signs of brain involvement, which is closely associated with loss of quality of life (Hartmann et al. 2000; Patidar et al. 2014; Flud and Duarte-Rojo 2019; Schomerus and Hamster 2001; Marchesini et al. 2001; Ridola et al. 2018; Montagnese and Bajaj 2019). Accordingly, detecting and treating MHE could prevent clinically manifest HE and improve the patients’ daily functioning. Treating MHE is straightforward but, owed to the subclinical nature of MHE, the detection of MHE requires psychometric testing (Vilstrup et al. 2014). Most psychometric testing is also straightforward but since there are many views on which tests to use and no gold standard, many liver centres do not systematically validate and use psychometric tests in cirrhosis patients (Lauridsen and Bajaj 2015; Labenz et al. 2020a; Sharma and Sharma 2014; Bajaj et al. 2007a). This is a bad situation for clinical decision-making and not less so for our patients. The scope of this review is to create an overview of the validation level and usage of psychometric tests used for diagnosis of MHE. Our work is aimed at the clinician or scientist as a guidance on which psychometric test would fit best in their clinic, cohort, or study. We introduce validation requirements; systematically present the validation level of each psychometric test; and their clinical use.

In our literature review, we have chosen to focus on the most widely used point-of-care psychometric/psychophysical tests used for HE diagnosis and prognostication: Portosystemic hepatic encephalopathy (PSE) test, continuous reaction time measurements (CRT), Stroop EncephalApp, animal naming test (ANT), critical flicker frequency (CFF), and inhibitory control test (ICT). We have not included electroencephalogram (EEG) because it is neurophysiological in nature, require highly specialized expertise, and as such is not a point-of-care modality. We have further chosen not to address the Scan test, the psychomotor Vigilance Task, and a few other psychometric tests where references were very few (Montagnese et al. 2012; Formentin et al. 2019).

Psychometric test validation obstacles and requirements

In patients with cirrhosis, psychometric tests are used for MHE diagnosis and for prognosticating clinically manifest HE. The validation requirements differ depending on the intended test usage.

Validating tests for MHE

When validating diagnostic tests for MHE the inherent obstacles are the lack of a gold standard test and the confounding of test results by cognitive deficits caused by other factors than the liver disease. Such factors may be metabolic, organic, modifiable, or permanent and challenge any test modality used to diagnose and quantify MHE. To overcome the lack of a gold standard the International Society for Hepatic Encephalopathy and Nitrogen Metabolism (ISHEN) recommend that the PSE test serve as a surrogate gold standard, or common comparator test in validation studies. In practice, this implies that even the best diagnostic validation studies are in fact agreement studies (Umemneku Chikere et al. 2019). The validation of psychometric tests for MHE diagnosis is subject to the same requirements as any other clinical test validation procedure and validation studies should follow the STARD checklist (Cohen et al. 2016). In studies on HE it is important that the effect of gender, age, and educational level on psychometric test results is examined as most psychometric tests are influenced by these factors to some extent. Proper validation thus requires establishment of normal values, which entails applying the test of interest to a large number of healthy persons from the target region/country. Moreover, if a psychometric test is used for monitoring the test precision should be evaluated thoroughly. This includes a measure of random, interobserver, and daytime test-retest variation in the target population. If these test metrics are not obtained, a treatment response could be masked or falsely enhanced by these factors. An important validation point is further to assess the test’s ability to identify patients who are at high risk of developing clinically manifest HE. Validation of a test’s prognostic abilities obviously requires a longitudinal study with clinically manifest HE as primary endpoint and should include patients stratified for prior clinically manifest HE across all liver cirrhosis severities.

Methods

For the conduct and reporting of this review the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) was used. A protocol was developed to outline research questions, search strategy, study selection criteria, and data extraction method.

Search strategy

A literature search strategy was developed to identify studies evaluating the relevant psychometric tests for diagnosing and monitoring MHE in patients with cirrhosis. Embase (Ovid), PubMed (Ovid), and Scopus (Elsevier) databases were searched from inception until April 2021. The search strategy was designed in PubMed using search terms such as "liver cirrhosis" followed by an analysis of the words contained in the title and abstract of relevant citations, and of the index terms used to describe the citations. The same search strategy was used in all databases and searches were restricted to original research published in English focusing on adult patients with cirrhosis and HE. Search strategies are available in Supp. Table 1-6.

Table 1 MHE detection and monitoring abilities of psychometric tests claimed to be validated for these purposes. Colors indicate if the question is answered in > 20 studies (dark green), 15-19 studies (light green), 10-14 studies (yellow), 5-9 studies (orange), less than 5 studies (light red), no study (red).

Study selection and data extraction

Search results were downloaded from the databases into Endnote X9 (Clarivate Analytics, PA, USA). All citations were subsequently screened by title and abstract. Studies were included or excluded based on the selection criteria stated in Supp. Table 1. Reasons for excluding any full-text articles were recorded and are presented in supplementary table 2. Finally, the reference lists of the final included articles were also manually searched to ensure all eligible articles were captured. Any disagreements were resolved by consensus. Results of the screening process are reported in a flow diagram in Supp. Figure 1-6.

Methodological assessment

No systematic methodological assessment of the studies was performed. The purpose of the review was to include all studies on psychometric tests for diagnosing and monitoring MHE if they clearly answered our 5 questions, regardless of methodological quality.

Validation level and practical use of common psychometric tests for HE

We asked the following question regarding the clinical validation of the PSE test, CRT, Stroop EncephalApp, ANT, CFF, and ICT:

  1. 1.

    Which percentage of patients with cirrhosis does the test deem as having MHE?

  2. 2.

    Is the test able to predict clinically manifest HE?

    1. 2.1.

      The percentage of patients with abnormal test result that develops clinically manifest HE

    2. 2.2.

      Post-test clinically manifest HE probability (pre-test lifetime probability is set at 30%)

  3. 3.

    Is there described a test-retest variation and inter-observer variation?

  4. 4.

    Is the test able to detect a treatment response?

  5. 5.

    Is the test result affected by age, educational level, gender, or comorbidities?

The answers are presented in Table 1.

Portosystemic hepatic encephalopathy (PSE) test

The PSE test is the most widely used tool for the assessment of MHE (Morgan et al. 2016). The PSE test is a paper-pencil, test battery consisting of five well-established psychometric tests taken from the field of neuropsychology, assessing a range of cognitive domains (Weissenborn et al. 2001): The Digit Symbol Substitution Test (DST), the Number Connection Test A and B (NCT-A/B), the Serial Dotting Test (SDT), and the Line Tracing Test (LTT). Using the official PSE test instruction manual, which has been translated into several languages, a trained health professional guides the patient through the five subtests. Each test provides an age-adjusted scores ranging from -3 to +1 based on normative values from healthy individuals, which are finally summated into the total Portosystemic Hepatic Encephalopathy Score (PHES), which range from -18 to +6. While originated in Germany, the PSE test has now been validated in several countries (Table 2), and the noticeable variation in normative values between different populations demands country-specific data sets from healthy individuals. The country specific PHES cut-off varies between from < -4 to < -3.

Table 2 Logistic and practical specifics of tests validated for MHE diagnosis.

MHE prevalence as defined by abnormal PHES has been reported in numerous studies from various populations; this systematic review identified more than 90 studies (Ahn et al. 2017; Acharya et al. 2021; Agarwal et al. 2020; Allampati et al. 2016; Amodio et al. 2008; Amodio et al. 2017; Ampuero et al. 2015; Ampuero et al. 2018; Bajaj et al. 2013; Bajaj et al. 2017b; Bajaj et al. 2019; Bale et al. 2018; Burkard et al. 2013; H. J. Chen et al. 2015; Q. F. Chen et al. 2020; Corrias et al. 2014; Dhiman et al. 2010; Duarte-Rojo et al. 2011; Duarte-Rojo et al. 2019; Formentin et al. 2021; Gabriel et al. 2020; Goldbecker et al. 2013; Giménez-Garzó et al. 2017; Greinert et al. 2016; A. Gupta et al. 2010; Irimia et al. 2013; Jackson et al. 2016; Jagtap et al. 2021; Jeong et al. 2017; Khodadoostan et al. 2018; Kircheis et al. 2007; Maldonado-Garza et al. 2011; Kimer et al. 2021; Labenz et al. 2018; Kircheis et al. 2014; Labenz et al. 2019d; Labenz et al. 2019b; Labenz et al. 2019c; Labenz et al. 2019a; Lauridsen et al. 2017; Lauridsen et al. 2020; S. W. Li et al. 2013; W. Li et al. 2015; Lunia et al. 2013; Luo et al. 2020; Machado Júnior et al. 2020; Mardini et al. 2008; Metwally et al. 2019; Mina et al. 2014; Montoliu et al. 2009; Montoliu et al. 2011; Moratalla et al. 2017; Nardelli et al. 2016; Moscucci et al. 2011; Nardelli et al. 2017; Nardelli et al. 2019b; Olesen et al. 2019; Osman et al. 2016; Özel Coşkun and Özen 2017; Pflugrad et al. 2015; Philonenko et al. 2019; Rathi et al. 2019; Rega et al. 2021; Riggio et al. 2011; Riggio et al. 2015; Rai et al. 2015; Romero-Gomez et al. 2007; Román et al. 2011; Román et al. 2013; San Martín-Valenzuela et al. 2020; Santana Vargas et al. 2021; Schmid et al. 2010; Seo et al. 2012; Shrestha et al. 2020; Singh et al. 2016; Soriano et al. 2012; Strebel et al. 2020; Sunil et al. 2012; Taneja et al. 2012; Stawicka et al. 2020; Tryc et al. 2014; Tsai et al. 2015; Tsai et al. 2016; Tsai et al. 2019; Umapathy et al. 2014; Varakanahalli et al. 2018; J. Y. Wang et al. 2019; Wuensch et al. 2017; Wunsch et al. 2011; Yoon et al. 2019; Zeng et al. 2019; Badea et al. 2016; Formentin et al. 2019; Lunia et al. 2014; Lauridsen et al. 2015; Montagnese et al. 2011; Pawar et al. 2019; Wernberg et al. 2019). In mixed study populations of both compensated and decompensated cirrhosis, the MHE prevalence, using region–specific cut offs, ranged from 6% to 60% (52 studies). Of notice, the MHE prevalence was between 25%-45% in 35 of 52 studies. In study populations only including patients without prior clinically manifest HE the MHE prevalence ranged from 21% to 56% (25 studies). In study populations of decompensated cirrhosis and candidates for liver transplantation, the MHE prevalence ranged from 37% to 75% (8 studies). Eleven studies described MHE prevalence for cirrhosis patients classified by Child-Pugh A, B, and C, which was 4-48%, 35-73%, and 40-100%, respectively.

Twenty studies with follow-up data on the PSE test and HE-related hospitalizations were identified (Ahn et al. 2017; Dhiman et al. 2010; Duarte-Rojo et al. 2019; Formentin et al. 2019; Formentin et al. 2021; Gabriel et al. 2020; Irimia et al. 2013; Kircheis et al. 2014; Labenz et al. 2019c; Labenz et al. 2020b; Nardelli et al. 2016; Nardelli et al. 2019a; Riggio et al. 2011; Riggio et al. 2015; Romero-Gomez et al. 2007; Varakanahalli et al. 2018; A. J. Wang et al. 2017; Wernberg et al. 2019; Goldbecker et al. 2013; Montagnese et al. 2011). The prognostic value of PHES was estimated in four studies, revealing a high NPV of 69-94%, while the PPV was less reliable, ranging from 29-75% (Labenz et al. 2020b; Nardelli et al. 2016; Wernberg et al. 2019; Goldbecker et al. 2013). Interestingly, the evidence for abnormal PHES as an independent risk factor for the development of clinically manifest HE is contradicting; PHES was associated with incident HE hospitalizations after adjustment of additional risk factors on multivariate analysis in 5 studies (Labenz et al. 2019c; Labenz et al. 2020b; Nardelli et al. 2016; Nardelli et al. 2019a; Montagnese et al. 2011), but was associated with incident clinically manifest HE only on univariate analysis in 6 studies, which all seemed to include an appropriate number of patients (Ahn et al. 2017; Formentin et al. 2019; Formentin et al. 2021; Romero-Gomez et al. 2007; Varakanahalli et al. 2018).

Given its status and application, the PSE test is frequently used as a measure of cognitive function as a main outcome when testing anti-HE treatment in clinical trials. The PSE test consistently detects clinical and cognitive improvement from 6 months and up to 16 months following liver transplantation (Acharya et al. 2021; Bajaj et al. 2017a; Mardini et al. 2008; Osman et al. 2016; Tryc et al. 2014), and between 31-78% of patients with abnormal PHES have normalized PHES at 6 months post-transplantation (Acharya et al. 2021; Bajaj et al. 2017b; Osman et al. 2016; Tryc et al. 2014). A treatment response following anti-HE treatment with lactulose and/or rifaximin is also consistently detected by the PSE test in RCT studies, demonstrating a normalization of PHES in 57-85% depending on the treatment (Lauridsen et al. 2017; Pawar et al. 2019; Rai et al. 2015; Singh et al. 2016). With regards to experimental treatments, the results are inconsistent (Burkard et al. 2013; Grover et al. 2017; Lunia et al. 2014; Schmid et al. 2010; Strebel et al. 2020; Varakanahalli et al. 2018).

A total of 18 studies have described how age, education, sex, and comorbidities affect PHES in healthy individuals and/or cirrhosis patients (Amodio et al. 2008; Badea et al. 2016; Bale et al. 2018; H. J. Chen et al. 2015; Dhiman et al. 2010; Duarte-Rojo et al. 2011; A. Gupta et al. 2010; S. W. Li et al. 2013; Machado Júnior et al. 2020; Mardini et al. 2008; Rathi et al. 2019; Seo et al. 2012; Tsai et al. 2015; Wunsch et al. 2011; Goldbecker et al. 2013; Jagtap et al. 2021; Maldonado-Garza et al. 2011) and normative datasets now exist for 11 countries (see Table 2). Across these studies, education years is the most consistent predictor of PSE test results and is independently associated with performance in the five subtests in most studies, but with some discrepancy for the LTT (Amodio et al. 2008; Dhiman et al. 2010; Wunsch et al. 2013), and SDT (Wunsch et al. 2013; Seo et al. 2012). Moreover, high age is independently associated with worse performance in the five subtests, also with discrepancy on the LTT (Dhiman et al. 2010; Wunsch et al. 2013; Maldonado-Garza et al. 2011). Of note, a large multicenter study from India reports that age and education do not associate with total PHES score but normal values from almost all other regions stratify for age and/or educational level ((Rathi et al. 2019). Not many studies have investigated how comorbidities influence PSE test performance but results from two studies showed no association between subtest results and diabetes (Rathi et al. 2019; A. Gupta et al. 2010). Although four parallel versions of the PSE subtests are available, several studies have consistently demonstrated some degree of learning effect in both healthy individuals and cirrhosis (Amodio et al. 2008; Jeong et al. 2017; Duarte-Rojo et al. 2011; Tsai et al. 2015; Umapathy et al. 2014), thus limiting their use for repeated measurements. The learning effect is evident at re-testing after a few days but persists for several months and was present for up to 2-3 years after the first test (Amodio et al. 2008). The sustained learning effect observed in these studies is however contradicted by Goldbecker et al. who did not find any learning effect in 53 healthy controls after 7 months (Goldbecker et al. 2013) . Learning can improve PHES with approx. 1-1.5 points. Interestingly, Umapathy et al. found that the learning effect on PHES did not apply to patients with incident clinically manifest OHE within the past 12 weeks (Umapathy et al. 2014).

CRT measurements

The CRT method measures the ability to complete a motor reaction to an auditory stimulus adequately and repeatedly and as such gives a measure of attention and psychomotor accuracy (Lauridsen et al. 2012). CRT is a computerized test where the test person is equipped with a pair of headphones and a trigger button in the dominant hand and then instructed to respond as fast as possible to a total of 100-150 high pitch sound signals, delivered at random intervals of 2 to 6 seconds. CRT measurement has been used for decades in various brain disorders in which lag time is a dominant feature. The test was first validated as a diagnostic tool for HE in the 80s by Elsass et al who found that patients with HE specifically exhibited a greater than normal intraindividual variance in reaction times during this 10-minute test. A reaction time stability index (50th percentile/ (90th – 10th percentile)) was able to distinguish patients with HE from other brain disorders, and controls (Elsass 1984).

The CRT as a tool for diagnosis of MHE has almost exclusively been studied in Danish centers, and the MHE prevalence in patients with liver cirrhosis estimated across 5 studies ranges from 38-59% using a standardized cut-off Index of <1.9 (Kraglund et al. 2019; Lauridsen et al. 2012; Lauridsen et al. 2014; Lauridsen et al. 2015; Wernberg et al. 2019). The ability of CRT to predict future clinically manifest HE was compared head-to-head with the PSE test in one outpatient cohort with long follow-up, where CRT proved equal to the PSE test in PPV and NPV. In the study, the CRT test was characterized by low specificity (46%), whereas the PSE test had low sensitivity (42%), however considerable variability has been observed across many studies (Wernberg et al. 2019). CRT was used to detect reversibility of MHE following anti-HE treatment in two smaller studies; in 54 patients with decompensated cirrhosis treatment with rifaximin for 4 weeks improved CRT index from 1.7 to 2.1, whereas PHES did not change significantly compared with placebo (Kimer et al. 2017). In the other study of 44 patients where half had abnormal CRT index at baseline, 3 months of treatment with lactulose, rifaximin and branched chained amino acids (BCAA) in combination improved CRT index from 1.7 to 2.2 (Lauridsen et al. 2017).

In opposition to crude reaction times, the CRT index is unaffected by age, sex, and education in both healthy individuals and patients with cirrhosis (Lauridsen et al. 2012). Moreover, the test exhibits no learning effect, nor interobserver or time-of-day variation, however, the test holds some test-retest variation and only changes in CRT index larger than 0.68 can be taken as a sign of a clinically relevant change. (Lauridsen et al. 2017). One study has examined the effect on CRT index of common medical comorbidities and found that moderate to severe heart failure and chronic obstructive pulmonary disease to some extent can increase reaction time variability (Lauridsen et al. 2016).

Stroop EncephalApp

Stroop EncephalApp is a digitalized version of a classic psychometric test that has been used in slightly differing forms since 1890 (Jensen and Rohwer Jr. 1966). The test utilizes the Stroop effect to test vigilance and impulse inhibition and, in its digital form, has an “Off state” where the color of a hashtag (#) displayed on the screen must be indicated; and an “On state” where the words “red”, blue”, and “green” are displayed in an incongruent color of red, blue or green. The difficult part is then to keep naming the color of the letters and not the color the letters spells. Stroop EncephalApp was first introduced as a diagnostic test for MHE by Amodio et al. 2005 (Amodio et al. 2005) and later taken up by Bajaj et. al (Bajaj et al. 2013) and later validated in a comprehensive multicenter study for its use in the US population (Allampati et al. 2016). Studies justifying its’ use have since been performed in China (Zeng et al. 2020), Japan (Kondo et al. 2021), and Brazil while only the study from China included a healthy reference group. In both the US and Chinese validation studies, 51-71% of patients with liver cirrhosis have MHE defined by the validated cutoff which is OffTime+OnTime 189 (China) and 190 (US) seconds (Allampati et al. 2016; Zeng et al. 2020; Machado Júnior et al. 2020), respectively. When comparing results to PSE test sensitivities range from 72-97% and specificity from 49-90% (Kondo et al. 2021; Zeng et al. 2020; Luo et al. 2020; Duarte-Rojo et al. 2019; Zeng et al. 2019; Acharya et al. 2017; Allampati et al. 2016; Bajaj et al. 2015; Bajaj et al. 2013; Machado Júnior et al. 2020). The US multicenter study is the only one to examine the ability of Stroop EncephalApp to predict clinically manifest HE and found that hazard ratios for patients performing abnormally on Stroop is 2.5 (no confidence interval reported) for development of manifest HE. Extracted data from the available studies showed that random test-retest variance is reported to be approx. 10 seconds while treatment by liver transplantation or correction of hyponatremia and worsening by TIPS induced changes of 20-30 seconds in OffTime+OnTime (Allampati et al. 2016; Acharya et al. 2017). The test is influenced by age, gender and education and normal values are adjusted accordingly (US norms available here: encephalapp.com/test1.html).

Animal naming test

ANT is the most recently introduced test in for the diagnosis of MHE but is, within other research fields, a longstanding test of semantic fluency. It is a very simple test during which the test person is asked to name as many animals as possible in one minute. ANT is part of the Repeatable Battery for Assessment of Neuropsychological Status (RBANS) but has been proposed as a stand-alone, fast, screening test to identify patients with MHE (Campagna et al. 2017). In validation studies, it is proposed to be used as a rule-in and rule-out test where anyone in the “grey zone” is referred for further psychometric testing, while immediate treatment for HE is initiated in patients where MHE is ruled-in (Qu et al. 2021; Agarwal et al. 2020; Labenz et al. 2019a; Campagna et al. 2017). Standardized ANT normal values in healthy cohorts already exist in several countries (references are listed in the Campagna article in Supp. Table 7). The different cirrhotic study populations show MHE frequencies ranging from 35-42% using country specific ANT cut-offs (Labenz et al. 2019a; Campagna et al. 2017; Qu et al. 2021; Labenz et al. 2020b; Agarwal et al. 2020). ANT’s ability to predict clinically manifest HE was examined in two studies with HR 2.1 and 2.5 respectively and AUROC 0.63. The studies further report that the fewer animals your patient can name the greater is the risk of developing clinically manifest HE (Labenz et al. 2020b; Campagna et al. 2017). The ability of ANT to detect a treatment response to MHE treatment has not been investigated but in the Italian cohort, ANT clearly detected resolvement of manifest HE by standard anti-HE treatment (Campagna et al. 2017). ANT is influenced by age and education with ceiling effects, but not gender (Campagna et al. 2017). To overcome this issue, it was proposed by Campagna et al. to add 3 animals if the tested person has less than 8 years of education and add 3 animals if the person is over 80 years of age. There is a small learning effect from the first to the second ANT test but any change larger than 2 animals per 10 named animals can be taken a sign of real change (Campagna et al. 2017). Anyone can conduct an ANT test and patients can be evaluated by telephone. Therefore, ANT seems a promising tool in MHE management around the world.

Critical flicker frequency

CFF is a simple, computerized test that measures hepatic retinopathy which is believed to reflect the hepatic encephalopathy, although no firm evidence exits to confirm this. As such the CFF is not a test of cognition but at neurophysical test (Goldbecker et al. 2013). We have chosen to include it here because it fits the description of a point-of-care test, is easy to perform, requires only a short introduction and is used for MHE diagnosis primarily in some regions of Germany. The test uses a red light that flickers at a frequency of 60 Hz. At this frequency the flickers cannot be seen and instead the light is perceived as a constant red dot. Slowly the frequency is decreased until the flickering becomes visual to the patient (Lauridsen et al. 2011). According to the test manual at least 8 pre-measurements must be performed to accustom the patients to the CFF procedure (R&R Medi-Business Freiburg, Freiburg, Germany). After stable CFF values were achieved, 10 repetitive measurements must be completed. The examination takes roughly 10 min. Patients with MHE have a lower threshold for perceiving the flickering light compared with healthy individuals. The different study populations of patients with liver cirrhosis show MHE frequencies ranging from 18% to 61% (Labenz et al. 2020b; Jagtap et al. 2021) Also, the sensitivity and specificity using the PSE test as the gold standard are highly varying between the studies, with sensitivity and specificity varying from 37% and 39%, respectively to almost 100 %.(Özel Coşkun and Özen 2017; Sharma et al. 2007; Metwally et al. 2019; Kircheis et al. 2014). Only few studies have investigated the ability to predict clinically manifest HE and again with a high variation from no prognostic value of CFF to a HR of 18.45 (CI95% 7.36-46.3) (Barone et al. 2018; Labenz et al. 2020b). The test-retest and interobserver variation has been reported to be negligible in studies by Kircheis and Gendal (Kircheis et al. 2002; Gencdal et al. 2014) but a recent study has reported problems in half of the test subjects with achieving acceptable test quality due to high variability between the 10 test runs, and due to indication of flicker at unrealistically high frequencies (Gabriel et al. 2020). CFF measurement is limited by colour blindness, problems with eye movement coordination and retinopathy. The test is not affected by gender and education and only weak correlation with age is observed in some of the studies decreasing the threshold with 1.25 Hz per life decade in patients with cirrhosis (Dhiman et al. 2010; Özel Coşkun and Özen 2017; Romero-Gomez et al. 2007; Maldonado-Garza et al. 2011).

Inhibitory control test

Inhibitory control test (ICT) is a computerized test of attention and response inhibition. Patients are presented with a sequence of letters on a computer screen and are instructed to respond every time an X precedes a Y (a target), and inhibit their response to non-alternating patterns (a lure) (Bajaj et al. 2007b). MHE can be predicted based on the number of lures a patient responds to with a sensitivity ranging from 65-93% and a specificity from 57-90% compared with the PSE test (Stawicka et al. 2020; Bajaj et al. 2007b; D. Gupta et al. 2015). The characteristics of study populations vary and the studies reporting high sensitivities are often small and highly selected. The prevalence of MHE ranges from 35-82% (Duarte-Rojo et al. 2019; Tapper et al. 2020). The reported cut-off values vary in different regions/countries (Taneja et al. 2012). Because of the reported variations in number of lures and targets used to define MHE it has been proposed to use weighted lures to define MHE (Amodio et al. 2010). ICT can be used to predict the development of clinically manifest HE with varying sensitivity (45-86%) and specificity (67-98%) (Goldbecker et al. 2013; Duarte-Rojo et al. 2019). The highest sensitivity and specificity have been achieved after adjusting for age and education as both have been shown to impact on the test result (Goldbecker et al. 2013; Bisiacchi et al. 2014; Burroughs et al. 2018). However, the test seems to be independent of gender (Tapper et al. 2019). A few studies have used ICT to report treatment responses. They show that patients respond to an increasing number of ICT lures with worsening manifest HE, and a decreasing number of lures when anti-HE treatment is administered (Bajaj et al. 2008; Pawar et al. 2019). However, in some studies significant test-retest differences have been reported indicating a learning potential in the ICT (Goldbecker et al. 2013; Nardelli et al. 2017; Amodio et al. 2010), but this is more often found in persons without cognitive impairment and cannot be replicated in all studies (Bajaj et al. 2010; Nardelli et al. 2017). The test is easy to administer and can be performed after a short introduction by medical assistants in less than 15 minutes (Bajaj et al. 2007b; Bajaj et al. 2008).

Discussion

Our literature review demonstrates that 6 quite well-validated tests are available for the use in patients with liver cirrhosis (not including EEG, which is also well-validated but not a focus in the current/this review). Overall, the PSE test, ANT, and Stroop EncephalApp seems to be the favored tests when considering the geography of validation studies and reporting sites. The PSE test, CRT and ANT seems to be validated on all relevant parameters, although CRT only in a single geographical region. The PSE test is, overall, the best validated and most used test. Time consumption, duration of staff training, and equipment needs are quite uniform between the PSE test, CRT, StroopEncephal App, CFF, and ICT, although Stroop only require a smartphone/tablet. ANT stands out by being the fastest and simplest, requiring no more than 1 minute and a willing patient.

For each of the tests MHE prevalence varied considerably across studies. This is not surprising and likely owed to the varying cohorts (lower prevalence in HE naïve cohorts and higher in transplantation candidates), and the fact that the tests measure different cognitive domains and have differing sensitivities. An example of the latter is that CRT, ICT and Stroop results are much more liable and will pick up even very small changes in performance, while PHES, which operates within standard deviations, will only change if performance is markedly changed in one or more of the 5 subtests. Nonetheless, the MHE prevalence found across studies (approx. 20-80%) underlines that cognitive problems are very frequent in patients with liver cirrhosis. Bearing in mind the strong correlation between cognition and quality of life, this is obviously something we need to address by initiating treatment (Ahluwalia et al. 2013; Amodio et al. 2004; Bajaj 2008; Bajaj et al. 2020; Bale et al. 2018; Bao et al. 2007; Greinert et al. 2018; Kappus and Bajaj 2012; Labenz et al. 2018; Labenz et al. 2019d; Lauridsen et al. 2020; Marchesini et al. 2001; Mina et al. 2014; Montagnese and Bajaj 2019; Ridola et al. 2018; Román et al. 2011; Schomerus and Hamster 2001). In a treatment context, test sensitivity should be weighted higher than specificity because of the few side effects of anti-HE treatment, and the huge impact of manifest HE on the patients’ autonomy and disease course. In other words, it is better to treat too many than too few.

There is still a sense of reluctance to initiate anti-HE treatment based on the detection of MHE by one or two psychometric tests as assessed by clinicians’ response to questionnaires on their clinical practice (Bajaj et al. 2007a; Labenz et al. 2020a; Lauridsen and Bajaj 2015; Sharma and Sharma 2014). This reluctance may be owed to doubts amongst the clinicians on the psychometric tests’ ability to predict clinically manifest HE. However, an increasing number of studies now substantiate test robustness by examining its ability to predict clinically manifest HE. In that connection, we found that the PHES has a high NPV (up to 94%) and, as such, performs well in ruling out future clinically manifest HE, although some inconsistencies exist among studies. CRT also seems to be able to rule out future clinically manifest HE (NPV 78%) although this has only been shown in a single study. Similarly, in a single study, the Stroop seems to identify a group of HE naïve, who will develop clinically manifest HE at more than double rate (HR > 2). For ANT, CFF, and ICT, AUROC values are around 0.60, which indicate that they seem to miss many patients who later develop clinically manifest HE although ANT and CFF have been reported to identify patients who will develop clinically manifest at more than double rate (HR > 2). HR for manifest HE in patients with abnormal ICT is close to 1, stressing its’ limited predictive ability.

All the psychometric tests are affected by age, gender, and education level to some extent and normal values are adjusted accordingly. One exception is the reaction time variability measured by CRT index, which remains stable with varying age, intellect, and gender. In all tests a learning effect is present but sought minimized in the PSE test, CRT, Stroop, CFF and ICT by using practice runs and the PSE test further uses parallel versions. Confounding of test results from other mild cognitive deficits caused by common comorbidities, is not thoroughly examined for most psychometric tests for MHE. This is a concern and a limitation in test accuracy but should not hinder our willingness to attempt treatment for MHE. After all, a significant improvement in psychometry after anti-HE treatment in a co-morbid patient will indicate that HE plays a role in the cognitive dysfunction and that the treatment is worth continuing.

Our review of the literature stresses the need for more studies on the tests’ ability to predict clinically manifest HE, which is the most important clinical outcome. Prediction of clinically manifest HE indicated by a poor performance on a psychometric test validated for this specific purpose is a solid indicator that anti-HE treatment should be initiated. To this end the PSE test seems to best choice because of its’ robust validation and NPVs of 69-94%.

In summary, in line with international guidelines, the PSE test is the most widely used and validated test for MHE. The remaining tests are validated to an acceptable extent but only in certain geographical areas. All but the ICT can aid in identification of patients in high(er) risk of OHE, and all but CFF have been shown to detect a response to treatment. So, which psychometric test is the best choice? For research purposes the PSE test is almost mandatory in studies where cognition in liver cirrhosis is an endpoint. For clinical use when implementing routine psychometric testing to diagnose MHE and predict manifest HE the PSE test seems the wise choice — not least in regions where the PSE test is already validated. In geographical regions where local tests are well-validated these can of course be used. In clinics with few recourses, in primary care, and for home monitoring ANT may be a good choice due to its simplicity. In the end, the most important issue here is not which test is chosen but that clinics make a choice and start using the test routinely so more patients can be offered treatment to improve MHE and prevent clinically manifest HE.