HIV-associated neurocognitive disorders (HAND) in adults are estimated to be between 30% and 60%, despite the use of combined anti-retroviral therapy (ART; Simioni et al., 2010; Heaton et al., 2010; Heaton et al., 2011; Ciccarelli et al., 2013; Chan & Brew, 2014). Epidemiological studies currently show that HIV-associated dementia (HAD) is rare (2–4%; McArthur, 2004). Most patients present less severe forms of HAND, including asymptomatic neurocognitive impairment (ANI) and mild neurocognitive disorder (MND; Heaton et al., 2010; Munoz-Moreno et al., 2014; Sacktor et al., 2016).

At the beginning of the HIV epidemic, many patients developed severe neurological impairment in the final months of their illness. The clinical syndrome, comprising cognitive, behavioral and motor symptoms was termed AIDS dementia complex (ADC). The cognitive impairment predominantly consisted of mental slowing and attention and memory deficits. With the introduction of ART, ADC incidence decreased, but patients with treatment and long-term infection present milder cognitive symptoms. In addition, a shift has occurred in certain demographic variables and risk factors, like increased age and cardiovascular risk factors. Thereby, the phenotype of HAND has expanded, with broadening of neuropsychological profile (Cysique, Maruff, & Brew, 2004; Woods, Moore, Weber, & Grant, 2009). Patients with HAND present a subcortical profile of cognitive impairment, with the core deficits being mental slowness, attention and memory deficits and impaired executive functions (Woods et al., 2009). The decreased speed of information processing is one of the most frequent cognitive abnormalities in HAND (Cysique et al., 2004; Woods et al., 2009; Schouten, Cinque, Gisslen, Reiss, & Portegies, 2011). Because mental speed facilitates most of the cognitive and motor processes, some authors even consider it the key deficit which in turn leads to impairments in other cognitive domains (Hardy & Hinkin, 2002). Patients with HAND also present attention and working memory impairments. These cognitive functions are closely related, the ability to create a memory for temporary processing and to store the information being dependent on attention functions. In consequence, the two deficits occur simultaneously (Woods et al., 2009; Schouten et al., 2011). Also, patients with HAND have been reported to present impaired executive functions, with deficits in reasoning, planning, problem solving and shifting between tasks (Dawes et al., 2008; Schouten et al., 2011). In the memory domain, the patients present mainly learning new information and prospective episodic memory deficits, with impaired ability to execute a future intention or “remembering to remember” (Schouten et al., 2011). The most frequent language problem in HAND consists of impaired fluency, although this could be also due to mental slowness or executive dysfunction (Dawes et al., 2008; Schouten et al., 2011). Less frequently, patients with HAND may present sensory-perceptual impairments, with disturbances in interpretation and integration of auditory, visual or sensorial stimuli (Schouten et al., 2011).

Currently, HAND can be classified according to the Frascati criteria (Antinori et al., 2007), with different degrees of cognitive impairment that are separately diagnosed. In patients with ANI, the neuropsychological test performance is at least one standard deviation (SD) below the normative data in at least two domains with intact daily functioning. On the other hand, MND is characterized by similar neuropsychological test results, with impaired daily functioning. HAD is characterized by severe deficits in at least two cognitive domains, typically two SDs below normative data, and more severe daily functioning impairment (Antinori et al., 2007).

Recently, the validity of the Frascati criteria has been challenged (Gisslen, Price, & Nilsson, 2011; Nightingale et al., 2014). Researchers disagree over the validity of neuropsychological testing in characterizing ANI. Some researchers have argued against testing for ANI since there are no screening tools with high sensitivity and specificity that can be used in all clinical settings and there is no consensus on the therapeutic management of asymptomatic patients. In addition, screening can lead to unnecessary and expensive diagnostic procedures and a positive result might cause distress to some people living with HIV (Nightingale et al., 2014). Furthermore, some observational studies have not shown an association between ART with estimated high central nervous system (CNS) effectiveness and neurocognitive function (Simioni et al., 2010; Giancola et al., 2006; Smurzynski et al., 2011).

On the other hand, there are some arguments that support screening for ANI. Several studies have shown that when the prevalence of ANI is high, patients may present poor adherence to medication, as well as high unemployment rates (Gorman, Foley, Ettenhofer, Hinkin, & van Gorp, 2009). Additionally, ANI might be associated with an increased risk of progressive neurocognitive disease (Grant et al., 2014). Some studies have reported that ART with high CNS effectiveness is associated with improved cognitive function (Cysique, Waters, & Brew, 2011) and after changes to ART on the basis of estimated CNS effectiveness, the levels of HIV RNA in the cerebrospinal fluid (CSF) declined, leading to improved cognitive functions (Smit et al., 2004). Furthermore, some ART have proven to be neurotoxic (Robertson, Liner, & Meeker, 2012). Although it has been argued that since ANI is “asymptomatic” and so the diagnosis of ANI may have little clinical significance, recent research has demonstrated that ANI patients present grey and white matter abnormalities (Haziot, Barbosa Junior, Vidal, de Oliveira, & de Oliveira, 2015) as well as abnormal blood plasma biomarkers (e.g., nadir CD4 count, neopterin, neurofilament light chains; Chan & Brew, 2014).

In general, there is agreement between international guidelines regarding the diagnosis of HAND (for a review see Underwood & Winston, 2016). These guidelines which have a specific section regarding cognitive impairment, recommend a comprehensive assessment including a thorough medical history and examination, screening for depression, neuropsychological testing, magnetic resonance imaging (MRI) of the brain and lumbar puncture (European AIDS Clinical Society, 2018; Mind Exchange Working Group, 2013; HIV/AIDS Italian Expert Panel, 2017). However, there is no clear consensus regarding the specific tests that should be used as part of the neuropsychological assessment. All guidelines refer to the Frascati criteria, which have recommendations on several preferred tests for each cognitive domain, and recommend a complex neuropsychological assessment and testing of several cognitive domains (Antinori et al., 2007). Furthermore, the Mind Exchange Working Group suggests that the tests selected should be validated in the language and culture of the population and evaluated according to appropriate normative data available to interpret the results (Mind Exchange Working Group, 2013).

However, such tests are not available in many centers and require highly trained personnel (Antinori et al., 2007). Therefore, brief screening tests that are sensitive, easily accessible, and can be administered by clinical staff across a range of settings would be useful. Nonetheless, most HIV treatment guidelines do not make any specific recommendations about screening for neurocognitive impairment. Regarding the guidelines that propose recommendations, there is considerable variation in guidance reflecting the uncertainties in the literature (Underwood & Winston, 2016). The European AIDS Clinical Society (EACS) guidelines (version 9.1, EACS 2018), recommend screening all HIV positive individuals without highly confounding conditions (such as severe psychiatric diseases, abuse of psychotropic drugs or alcohol, current CNS opportunistic infections or other neurological diseases, sequels of CNS disorders) at HIV diagnosis, before ART initiation and then later as indicated based on symptoms. The EACS screening method involves asking three questions: “Do you experience frequent memory loss?” “Do you feel that you are slower when reasoning, planning activities, or solving problems?” and “Do you have difficulties paying attention?”. Answering “Yes”, to at least one of these questions constitutes a positive screening test requiring further assessment. This approach differs from guidance given by the consensus report of the Mind Exchange Program which recommends screening within six months of diagnosis, before ART initiation, every 6–12 months if there is a high risk, every 12–24 months if there is low risk, and immediately if there is any clinical deterioration (Mind Exchange Working Group, 2013). The recommended screening tool depends on the following aspects: availability of an appropriately trained clinician suitably trained to administer and interpret each instrument, whether the clinician wants to screen for HAD only or for the milder forms of HAND, financial and time cost of testing, and the characteristics of the population in which the tool will be used. Because the neuropsychological resources are limited in many settings, a probable clinical diagnosis of HAND could be based on symptom questionnaires, screening tools, functional assessments, and limited neuropsychological testing. Patients with particular characteristics could then be targeted for full neuropsychological assessments (Mind Exchange Working Group, 2013). However, some preferred screening tests are mentioned, such as the HIV Dementia Scale (HDS) and the International HIV Dementia Scale (IHDS). The British HIV Association (BHIVA) recommends that HIV-positive patients should have access to screening for cognitive difficulties within the first three months of receiving an HIV diagnosis and all HIV-positive patients should have access to repeated screening following events that are known to trigger or exacerbate cognitive difficulties, and otherwise on an annual basis (Angus et al., 2016). These recommendations are similar to those recommended in the guide published by the Infectious Diseases Society of America (Aberg et al., 2014). The World Health Organization (WHO) recommends that routine screening and management for mental health disorders should be provided for people from key populations living with HIV in order to optimize health outcomes and improve adherence to ART. However, the screening method and frequency have not been specified (World Health Organization, 2016). The Italian Society for Infectious and Tropical Diseases recommends screening all people living with HIV if the patient presents cognitive complaints. Among the suggested tests, they recommend Montreal Cognitive Assessment (MoCA; HIV/AIDS Italian Expert Panel, 2017).

The guidelines recommend a neurological examination, brain MRI, and CSF examination in order to exclude other pathologies, if the neuropsychological impairment detected on screening is confirmed by tests exploring multiple cognitive domains, including: verbal fluency, executive functions, speed of information processing, attention and working memory, verbal and visual learning, verbal and visual memory, motor skills, and assessment of daily functioning. In addition, these guidelines recommend an assessment of CSF HIV viral load level and, where appropriate, evidence for genotypic drug resistance (GDR) in a paired CSF and plasma sample (EACS, 2018). After additional causes of cognitive impairment are excluded and a diagnosis of HAND is made, the clinician must take specific treatment and care measures (EACS, 2018).

Few screening tools have been developed and validated, including the HDS and its derivative form, IHDS (Sacktor et al., 2005; Bottiggi et al., 2007). Both instruments are relatively insensitive to the milder cognitive symptoms that predominate in the combination ART era (Skinner, Adewale, DeBlock, Gill, & Power, 2009). Although they are recommended as screening tools by expert HIV guidelines (Mind Exchange Working Group, 2013), a recent systematic review concluded that their accuracy is low. Summary estimates for the HDS as a test for HAND presented sensitivity and specificity of 42% and 91% respectively. On the other hand, when using IHDS as a test for all symptomatic HAND, the sensitivity and the specificity of this tool were 64% and 66% respectively (Haddow, Floyd, Copas, & Gilson, 2013).

Other screening tests, such as Mini-Mental State Examination (MMSE) have been used in clinical practice to detect cognitive impairment regarding a variety of neurological disorders. Clinicians are familiar with its use and it is widely used as the first-choice tool for screening HAND. However, studies have indicated that this instrument is not very reliable in detecting HAND (Kami-Onaga et al., 2018; Milanini et al., 2016; Skinner et al., 2009).

The MoCA has been used in people living with HIV as another screening instrument with variable results. It was developed in 2005 for detecting mild cognitive impairment (MCI) and has been shown to be highly sensitive and specific in older adult population (Nasreddine et al., 2005). MoCA is a brief bedside test assessing short-term memory, attention and working memory and frontal-executive functions, which are commonly affected in patients with HIV infection. Scores on the MoCA range from zero to 30 points, with a score of 25 or lower indicating a cognitive impairment. This cut-point is now widely used as a threshold for detecting cognitive impairment and possible dementia. In order to minimize practice effects, three versions of MoCA have been developed in English, which test the same domains, but the content of the tasks is different. The alternative versions of MoCA present comparable reliability to the original test (Costa et al., 2012). Translations in multiple languages are also available and the administration time is typically 10 min.

The first item of MoCA, the modified trial making test, requires visuomotor, visuoperceptual skills and mental flexibility to shift between numbers and letters (Crowe, 1998; Sánchez-Cubillo et al., 2009). To perform the second sub-test of MoCA and copy a cube, an individual has to initially convert the two-dimensional contour to a tri-dimensional figure, ability that is enhanced by learning abilities (Sinha & Poggio, 1996). After spatial planning, visuomotor coordination and integration of visual and fine motor sequences are also necessary. In Alzheimer’s disease, a poor performance in drawing-to-command and copying conditions was reported in less educated, older age, female and depressed participants (Gaestel, Amieva, Letenneur, Dartigues, & Fabrigoule, 2006). The clock drawing test has been extensively studied for detection of cognitive impairment. It evaluates visuoconstructive skills. In addition, in order to draw the clock face and to place the numbers correctly, participants need to have intact planning, conceptualization and symbolic representation (Pinto & Peters, 2009). When placing the hands of the clock to draw “ten past eleven”, an inhibitory response is also necessary. Although the scoring criteria for the clock drawing test in the MoCA has been simplified to decrease scoring complexity, scoring time, and minimize inter-rater variability, suboptimal inter- and intra-rater reliability for this item was reported (Price et al., 2011). Also, the task may be influenced by literacy status and education level (Nitrini, Caramelli, Herrera, Porto, et al., 2004). For the naming sub-test of MoCA, the patient has to name three animals that are presented visually. If the subject is unable to name it but can give contextual information on the animal, this is probably due to word finding difficulties or impaired semantic memory. If the patient cannot tell both, the name and the context, they probably present impaired visuoperceptual skills or semantic memory. Cultural exposure and low education can also determine errors on this task. Attention is assessed by the digit span items. The digit span forward implies retention of auditory stimuli and articulatory rehearsal. The digit span backward necessitates executive processing, working memory, ability to transform the numbers in reverse order and language functions. For the letter A taping sub-test, the participants need sustained and focused attention.

In the MoCA validation study, MCI participants and had comparable performance with normal controls, but the patients with Alzheimer’s disease were significantly more impaired on this task (Nasreddine et al., 2005). The serial 7 subtractions sub-test evaluates calculation abilities. The sentence repetition tasks of MoCA evaluate language skills, attention and working memory (Small, Kemper, & Lyons, 2000). The performance on this sub-test is also influenced by education. The letter fluency item of MoCA requires language abilities and intact executive functions with coordination of lexical and semantic knowledge, shifting from word to word, working memory, searching strategy and inhibition of irrelevant words (Troyer, Moscovitch, Winocur, Alexander, & Stuss, 1998; Henry & Crawford, 2004; Larsson, Almkvist, Luszcz, & Wahlin, 2008). The abstraction subtest, where the patient has to find the similarities between objects, evaluates semantic knowledge and conceptual thinking. On the memory sub-test, the participants have to recall 5 words, with 2 learning trials, and with 5 min between immediate recall and delayed recall. The category and multiple-choice cues can provide also information that helps distinguishing an encoding memory impairment which does not improve with cueing from a retrieval memory impairment that is improved with cueing. The last subtest assesses the patient’s orientation in space and time. These items were demonstrated to have a low value for detecting MCI (Nasreddine et al., 2005), but temporal orientation was reported to have high sensitivity in detection of dementia (O’Keeffe, Mukhtar, & O’Keeffe, 2011). Furthermore, patients with temporal disorientation have been demonstrated to present also impaired verbal memory (Ryan, Glass, Bartels, Bergner, & Paolo, 2009).

The MoCA has a widespread international use, being recognized as one of the best screening tests (Ismail, Rajji, & Shulman, 2010; Jacova, Kertesz, Blair, Fisk, & Feldman, 2007) as several previous studies have consistently reported that it has good overall psychometric properties and a good sensitivity in accurately identifying milder forms of cognitive impairment in many clinical conditions. For example, in MCI, the internal consistency of MoCA was reported to be excellent, with a Cronbach’s alpha of 0.83 on the standardized items (Nasreddine et al., 2005). The test-retest reliability was also good, with a mean change in MoCA scores from the first to second evaluation of 0.9 points (Nasreddine et al., 2005). In addition, in studies that applied Rasch analysis techniques, the researchers found that scores on the MoCA can be used to quantify the amount of cognitive ability a person has and can be used to track changes in cognitive ability over time (Koski, Xie, & Finch, 2009). Also, in patients with a subcortical type of cognitive impairment like Parkinson’s disease, Cronbach’s alpha was reported to be 0.66 to 0.79 (Ozdilek & Kenangil, 2014; Nie et al., 2012), with a good inter-rater reliability of 0.81 and a test-retest reliability of 0.79 (Gill, Freshman, Blender, & Ravina, 2008).

Validation studies of the MoCA have been conducted concerning different types of neurological disorders, such as MCI (Freitas, Simoes, Alves, & Santana, 2013), Alzheimer’s disease (Freitas et al., 2013), Parkinson’s disease (Hoops et al., 2009), and Huntington’s disease (Bezdicek et al., 2013). In a recent systematic review, MoCA as a screening test for dementia and multidomain cognitive impairment in stroke patients, at the usual threshold of 26, presented a high sensitivity (0.95) but at cost of specificity (0.45). An adjusted cutoff of 22 has improved its specificity (0.84) without sacrificing sensitivity (0.78; Lees et al., 2014). In the diagnosis of dementia (including Alzheimer’s disease, vascular dementia, Lewy body dementia and frontotemporal dementia), the recommended threshold of 26 presented a high sensitivity of 0.94 or above, but a low specificity of 0.60 or below. The systematic review pointed out that cut-off scores lower than 26 were likely to be more useful for optimal diagnostic accuracy of MoCA concerning dementia (Davis et al., 2015). A recent systematic review of the literature which evaluated the diagnostic accuracy of the MoCA in differentiating healthy cognitive aging from possible MCI found that the optimal score, which maximized true positives while minimizing false positives, was a cutoff of 23. Although sensitivity was lower at 23 (0.83) than at 26 (0.94), specificity was higher (0.88 vs. 0.66) and the balance between true positive and false positive results was better (Carson, Leach, & Murphy, 2018).

To resume, early diagnosis and specific treatment and care of HAND is essential. Although all the guidelines recommend for diagnosis the Frascati criteria, with an extensive battery of neuropsychological tests, this is time consuming, expensive and necessitates trained personnel. Thus, screening for HAND and identifying the patients that should be further investigated is essential, but the available guidelines on screening for HAND reflect the uncertainties in the literature and clinicians are faced with a difficult choice, namely, which screening test should they use. MoCA fulfils very important feasibility criteria for use in clinical practice, it has a short administration time, it is freely available, with multiple translations and has minimal training requirements. Furthermore, online training and certification is available on the MoCA website. In addition, it has proven to have good psychometric properties in other populations and assesses a broad range of cognitive domains.

With regard to HIV-positive patients, several researchers have explored the utility of the MoCA to detect cognitive impairment, but the sensitivity and specificity values and the cut-off scores have differed across studies. Although the diagnostic assessment pathways may vary across different countries, often HAND is screened in specialized infectious disease clinics during outpatient visits. The MoCA may help identify people living with HIV that require further assessments and specific care facilitating access to appropriate services. Nonetheless, being wrongly tested as positive, implies significant costs and harm due to further unnecessary investigations and psychological distress. Therefore, there is considerable value in determining the strength of the empirical evidence that supports the use of MoCA as a screening test for HAND. We aim to collate evidence from different studies, integrating the existing information and providing data for rational decision making, highlighting possible answers, that are easily accessible to clinicians, health care providers and policy makers.

The objective of this systematic review is to evaluate research regarding the accuracy of the MoCA test for diagnosing HAND against a concurrently applied reference standard and to highlight the quality and quantity of evidence available in this regard. Also, we aim to identify the gaps in the literature regarding this short neuropsychological test battery.

Methods

This meta-analysis was performed following the recommendations described in the Cochrane Handbook for Diagnostic Test Accuracy Reviews (Handbook for Diagnostic Test Accuracy Review n.d.) and a Cochrane generic protocol for cross-sectional and delayed-verification studies (Davis et al., 2013). Results were reported according to the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA; Moher et al., 2009).

Search Strategy and Selection Criteria

Supplemental Figure 1 shows the search strategy followed in the meta-analysis. A computerized bibliographic search was performed from the beginning to January 2019 on the following databases: MEDLINE/PubMed, Scopus, EMBASE, Cochrane Library, Latin American and Caribbean Health Sciences Literature (LILACS) and PsychINFO. In addition, a complementary manual search was performed on the MoCA website as well as by checking reference lists of all relevant research papers in order to identify possible additional studies. We also searched grey literature through Google Scholar but only papers from peer-reviewed, and indexed journals were included.

The following key words were used: “Montreal cognitive assessment” or the acronym “MoCA”, and “HIV infection” [MeSH] and “acquired immunodeficiency syndrome” [MeSH]. These search terms were for PubMed, the primary source of citations. Searches in other data sources used similar versions of these terms, appropriate for each database. We did not utilize search filters (collection of terms aimed at reducing the number needed to be screened) because our aim was to generate a comprehensive list of studies which would be suitable for answering the research question. Even the most sensitive filters have been found to miss relevant studies and perform inconsistently across subject areas and study designs, while at the same time have not significantly reduced the number of studies that need to be assessed for inclusion (Cochrane Handbook for DTA Reviews, Davis et al., 2013). In addition, we did not apply any restrictions on language.

Two authors reviewed the title, abstract and full text (when needed) of all retrieved research papers and assessed whether the study met the inclusion criteria. During the abstract review stage, in order not to miss any potentially eligible studies, we did not exclude the papers where we were not sure whether there was an appropriate reference standard or a full version of MoCA and if we were uncertain if the article was a diagnostic test accuracy study. We evaluated all these articles in full text. All initial reviews were rated equally, and participation of a third rater was not needed to address discrepancies.

Eligible studies were cross-sectional studies in which participants received the index test and the reference standard diagnostic assessment. Case-control studies were excluded owing to a high possibility of bias. We included studies reporting adults (over 18 years old) with confirmed HIV infection in which the association between MoCA score and HAND was assessed, MoCA being used as an index test. The index test was any full version of the MoCA. Although we expected to find the recommended cut-off score of 25 or below to differentiate normal (26 and above) from impaired cognition, we also included studies using other thresholds (22–27). The target condition was HAND, including ANI, MND, and HAD, as classified by the Frascati criteria (Antinori et al., 2007). We used as a reference standard for HAND a complex neuropsychological assessment, evaluating at least five neurocognitive domains (including verbal and language, attention and working memory, abstraction and executive function, learning and recall, speed of information processing, and motor skills), with consensusrecommendations on appropriate tests. In this study, as endorsed by international guidelines, a neurocognitive impairment was defined as an impairment in cognitive function on the above neuropsychological tests in which performance is considered clinically significant compared to appropriate controls matched by age and educational level (Antinori et al., 2007; European AIDS Clinical Society, 2018; Mind Exchange Working Group, 2013). We excluded studies of participants with confounding factors such as neurological disorders (e.g., recent traumatic brain injury, CNS infections, stroke, neurodegenerative disorders, and brain tumors), active psychosis, significant substance abuse, including alcohol and recreational drugs, and active infections.

Disagreements were resolved through discussion and a third rater was not needed to address differences. The methodological quality of the studies included was assessed by two authors independently (ECR, MS) according to the Cochrane Collaboration’s tool for assessing the risk of bias (Handbook for Diagnostic Test Accuracy Reviews) using the unmodified Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool (Whiting et al., 2011).

Statistical Analysis

Pairwise meta-analysis was performed to estimate the sensitivities and specificities, with 95% confidence intervals. Regarding studies that reported more than one threshold, we extracted data on diagnostic accuracy for thresholds ranging from 22 to 27. We calculated for each cut-off score the following parameters: sensitivity (proportion of individuals diagnosed with HAND who tested positive on the MoCA), specificity (proportion of patients indicated as normal, who tested negative on the MoCA), the positive predictive value (PPV - proportion of individuals with a positive MoCA test who were diagnosed with HAND), and negative predictive value (NPV - proportion of patients who tested negative on the MoCA, without HAND). In addition, we calculated the likelihood ratios for positive results (LR+), representing the increase in likelihood of a diagnosis of HAND after a positive test result on the MoCA as well as the likelihood ratios for negative results (LR-), indicating the decrease in likelihood of a diagnosis of HAND after a negative MoCA result. The larger LR+, the more informative is the test. Findings with LR+ greater than 1 indicate an increase in the odds of having a particular condition in a patient with a positive result. The larger the LR, the more convincingly the finding suggests the presence of disease. If LR+ does not differ from 1, it argues against the diagnostic value of the test and nothing has been learned by ordering the test (Straus, Glasizou, Richardson, & Haynes, 2019).

On the other hand, the LR- indicates the decrease in odds of having a disease in patients after obtaining a negative test result. If the LR- is smaller than 1, then the post-test probability of the disease being present decreases. The smaller the LR-, the more informative is the test result. A LR- of 1 means that the test is useless, because the odds of having the condition have not changed after the test administration (Straus et al., 2019). The increase in post-test probability only applies to specified pre-test probabilities.

Further calculations in our study included Phi, a measure of binominal effect size, and the Youden index, an estimated value of the optimal threshold at which sensitivity (proportion of true positive patients) is maximized and false positive results are minimized. The Youden’s index, is a global measure of test performance, used for evaluating the overall discriminative power of a diagnostic procedure. It is calculated by deducting one point from the sum of sensitivity and specificity expressed not as a percentage but as a part of a whole number: (sensitivity + specificity) – 1. Youden’s index equals 0 in a test with poor diagnostic accuracy, and in a perfect test, Youden’s index equals 1 (Šimundić, 2009).

We used the bivariate model to perform meta-analysis of sensitivity and specificity and added cut-off values as a variate in the model (Reitsma et al., 2005). In addition, we compared the recommended threshold of 26 with the cutoff score of 27 and analyzed sensitivity and specificity for each cutoff using the hierarchical summary receiver-operating characteristic (HSROC) model (Rutter & Gatsonis, 2001) that allows for the possibility of variation in cutoffs between studies. We computed the accuracy of MoCA with 95% confidence and prediction intervals. Confidence intervals (CI) indicate the range of the likely true value within the population, whereas prediction intervals account for both the uncertainty in knowing the value of the population mean as well as the scatter of presented data (thus usually wider than CI). Whilst the confidence region depicts uncertainty in the overall average value caused by sampling variability, the prediction region depicts variation from between study heterogeneity.

The systematic assessment of study quality used unmodified QUADAS-2 to determine the overall risk of bias for each study. All calculations were performed using Review Manager, version 5.3 (Cochrane Collaboration, Copenhagen, Denmark) and R software, version 3.0.2 (R Foundation for Statistical Computing, Vienna, Austria), with the mada package.

Results

Included Studies

From a total of 37 unique studies identified using the search strategy and assessed in full-text, we included eight studies the characteristics of which are summarized in Supplemental Table 1. The PRISMA diagram describing the selection process of studies is detailed in Supplemental Figure 1. Twenty-nine studies were excluded for the following reasons: duplicated data (3), inadequate reference standard (11), insufficient data reported (3), MoCA test was modified (4) or the subtests were chosen specifically to optimize testing in individuals with limited educational level (1), the MoCA was not the index test (6) or the research paper was not a diagnostic test accuracy study (1).

Methodological Quality of Included Studies

The QUADAS-2 scores for each domain are presented in Figs. 1 and 2. In the Patient Selection domain, the risk of bias was reduced by selecting only cross-sectional studies. A random or consecutive sample of patients was reported in four studies (Janssen, Bosch, Koopmans, & Kessels, 2015; Nam Su Ku et al., 2014; Joska et al., 2016; Milanini et al., 2016). Two studies that included only patients aged 60 and above (Milanini et al., 2014) or adults aged 50 years and older (Fazeli et al., 2017), were considered to have a high risk of bias. Inappropriate exclusions were avoided in five studies (Janssen et al., 2015; Nam Su Ku et al., 2014; Joska et al., 2016; Milanini et al., 2016; Overton et al., 2013). On the other hand, one study excluded eight patients from the initial selection because of their low educational level (< grade 7) and one owing to incomplete testing (Koenig, Fujiwara, Gill, & Power, 2016). All patients were recruited in outpatient clinics from urban areas.

Fig. 1
figure 1

Risk of bias and applicability concerns graph: review author’s judgements about each domain presented as percentages across included studies

Fig. 2
figure 2

Risk of bias and applicability concerns summary: review author’s judgements about each domain for each included study

Regarding the Index Test domain seven studies were considered as presenting an unclear risk of bias (Janssen et al., 2015; Joska et al., 2016; Koenig et al. 2016; Ku et al., 2014; Milanini et al., 2014; Overton et al., 2013). In all studies except one (Milanini et al., 2016) it was unclear if the index test results were interpreted without knowledge of the results of the reference standard. In addition, the pre-specification of threshold was absent or unclear in four studies (Nam Su Ku et al., 2014; Fazeli et al., 2017; Koenig et al., 2016, Milanini et al., 2016), and this finding was also considered to present an unclear risk of bias.

With regard to the Reference Standard domain, all studies used a reference standard that would correctly diagnose HAND. However, only one study specified that the reference standard was interpreted without knowledge of the index test results (Joska et al., 2016) and therefore all the other studies were classified as having unclear risk of bias.

Only three studies reported the flow and timing of the cognitive tests (Joska et al., 2016; Milanini et al., 2016; Overton et al., 2013) and one study spanned eight months. In this last case, we assumed that the interval between the index test and the reference standard might be inappropriate (Ku et al., 2014). There were no exclusions from the analysis in four studies (Joska et al., 2016; Milanini et al., 2014; Milanini et al., 2016; Overton et al., 2013). Generally, the studies had a low risk of bias and no study had more than one out of QUADAS-2 items assessed as having a high risk of bias.

Findings

In general, eight studies, that assessed 1014 patients were included. There was an overlap of participants because of the use of patients across several studies where multiple cut-off points were examined. Recruitment period was between 2009 and 2015. The study samples were selected from six different countries (USA, South Africa, South Korea, Canada, Netherlands and Italy). Samples ranged in size (from 67 to 200 participants), gender (37.2% males to 96% males), median age (40 to 64 years), educational level, CD4 values, and viral load. All patients were on ART except in two studies where only 89.70% (Ku et al., 2014) and 98% (Fazeli et al. 2017) of patients were on antiretroviral medication. The reference standard used the Frascati criteria with extensive neuropsychological batteries measuring multiple cognitive domains in all the studies. The characteristics of the included studies are presented in Supplemental Table 1.

Table 1 and Supplemental Table 2 shows data related to the full meta-analysis. Supplemental Table 3 shows the detailed analysis of the Youden index and Phi for each threshold. In addition, the forest plots of MoCA at different thresholds are presented in Fig. 3.

Table 1 Evaluation of MoCA at different thresholds
Fig. 3
figure 3

Forest plots of MoCA at different thresholds

Data from four studies with 556 patients were pooled for analysis at a threshold of 27, revealing an overall sensitivity of 0.77, a specificity of 0.50 and a low LR+ of 1.54 (see Table 1 and Supplemental Table 2). For a cutoff score of 26, we extracted data from six studies with 784 patients, the overall sensitivity of MoCA was 0.73, with a specificity of 0.54 and a LR+ of 1.58 (see Table 1 and Supplemental Table 2). We used the LR and found that between the recommended cutoff scores of 26 and the 27 there were no statistically significant differences in the sensitivity and specificity of MoCA at both thresholds. The summary receiver operating characteristic (SROC) curve of both thresholds with 95% confidence and prediction interval is presented in Fig. 4, where the 95% confidence region is a measure of within-study uncertainty (the precision of the test accuracy estimate) and the prediction region is a measure of between-study variability and defines the area in receiver operating characteristic (ROC) space where we are confident that a test performs within a stated degree of uncertainty.

Fig. 4
figure 4

SROC curve for the cut-off scores of 27 and 26

Three studies, including 461 patients, contributed to the analyzed data using a threshold of 25, the sensitivity and specificity of MoCA were 0.61 and 0.72, respectively, and the LR+ was 2.19 (see Table 1 and Supplemental Table 2). For a threshold of 24, 461 individuals were included, from three studies, the overall sensitivity of MoCA was 0.53, with a specificity of 0.82 and a LR+ of 2.87 (see Table 1 and Supplemental Table 2).

Data from four studies, which included 586 participants was extracted for a cutoff score of 23. The current analysis indicated that a threshold of 23 offered the best diagnostic accuracy (see Table 1 and Supplemental Table 2). Although MoCA sensitivity was lower at 23 than at the recommended threshold of 26 (0.44 vs. 0.73), its specificity was higher (0.79 vs. 0.54), with a better balance between true positives and false positive results (Youden’s index mean 0.384 vs. 0.293). However, the cut-off score of 23 presented a modest LR+ of 2.11.

For a cutoff score of 22, we could extract data from only two studies (see Table 1 and Supplemental Table 2), which included 287 HIV positive individuals. The sensitivity of MoCA was 0.23, with high specificity (0.90), but a lower Youden index than for the threshold of 23 (0.283 vs. 0.384). The LR+ was 2.27.

Owing to differences in the characteristics the studies and the small number of studies, we considered that performing a study of heterogeneity and advanced statistical analysis was not suitable.

Discussion

The present meta-analysis allowed us to make several key observations. Although the MoCA seemed to be a promising screening test for patients infected with HIV, our data revealed that it may not be the best discriminating tool for this specific population. At the original recommended cutoff score of 26, the test failed to adequately distinguish HIV-infected patients with cognitive impairment from those with normal cognition. The sensitivity decreased by changing the threshold, but the optimal cutoff score for diagnosing HAND was 23, offering the best balance between true positive and false positive results (see Table 1 and Supplemental Table 2). Although the LR+ was of 2.11, with a modest increase in the odds of having HAND in a patient with positive result, the impact of a LR is very dependent on the baseline probability of having the condition (prevalence of the disease or pre-test probability). In our study, at a threshold of 23, the prevalence or the pre-test probability of HAND was 0.39 (39%) and the probability of finding cognitive impairment after MoCA administration was 0.57 (57%). The same LR would result in a post-test probability of 76% if the prevalence of HAND in the study population was 60%. Therefore, the LR should always be interpreted in the context of the pre-test probability or the prevalence of the outcome (Straus et al., 2019).

While clinicians may prefer a test with high sensitivity, this could increase the number of patients referred for formal cognitive testing and further assessment. On the other hand, a higher specificity may reduce unnecessary referrals, but many true cases could be missed. Alternatively, a lower cutoff score could provide a better balance between true positives and false positive results and could be used to identify individuals that should be repeatedly monitored (Overton et al., 2013).

The original cutoff score of 26 demonstrated satisfactory sensitivity and specificity in general populations (Nasreddine et al., 2005). However, later systematic reviews revealed that MoCA, at the usual threshold presented a high sensitivity of 0.95 in stroke patients (Lees et al., 2014), 0.94 or above in dementias, including Alzheimer’s disease, vascular dementia, Lewy body dementia and frontotemporal dementia (Davis et al., 2015). Sensitivity regarding MCI was 0.94 (Carson et al., 2018), but at the cost of low specificity. The MoCA offered a better diagnostic accuracy for stroke patients at an adjusted cutoff score of 22 (Lees et al., 2014). In MCI the best diagnostic accuracy was offered by the threshold of 23 (Carson et al., 2018). This finding is consistent with the results of the present study (see Table 1). However, in our study, the sensitivity and specificity of the test were lower, compared to other disorders, at the cut-off of 26, the sensitivity was 0.73, and the specificity was 0.54. The threshold of 23, although associated with the best Youden index, offered a sensitivity of 0.44 and a specificity of 0.79. These differences could be due to several factors. A possible explanation could be that MoCA evaluates abstraction, object naming, clock drawing and language, which are domains that are not frequently related to HIV infection (Woods et al., 2009). Nonetheless, these items could be useful in older people living with HIV, which can present multiple comorbidities such as increased vulnerability to Alzheimer’s disease, cardiovascular risk factors and cerebrovascular disease (Milanini et al., 2014; Fazeli et al., 2017; Devlin & Giovannetti, 2017). Furthermore, the differences in sensitivity and specificity could be caused by the reference standard used.

The current criteria for diagnosing MCI include a subjective complaint and functional independence and there is no analog for ANI in DSM 5. The Frascati criteria identifies three severity levels for HAND, ANI being defined as neurocognitive impairment demonstrated by performance falling one standard deviation below the mean of demographically adjusted normative scores in two out of at least five measured domains. In this case, asymptomatic means that the patient has no clinically significant difficulties in everyday functioning (Antinori et al., 2007). Nevertheless, studies of neuropsychological batteries used for HAND in normal HIV-uninfected populations have suggested that, between 15 and 22% of individuals from an HIV-uninfected control group and 20% of a simulated normal population will score below the threshold for HAND, with false positive results. These errors are caused by two common practices to increase the sensitivity regarding milder neurocognitive abnormalities. First, extensive test batteries will have higher false-positive rates than individual tests because they involve multiple comparisons. The probability of an abnormal score increases as the number of tests performed per domain and the number of assessed domains increases (i.e., diagnosing a normal individual as impaired). Second, the high cutoff scores (z scores with a threshold of 1 SD) will increase the overlap between critical portions of test score distributions in individuals with and without disease (Gisslen et al., 2011; Meyer, Boscardin, Kwasa, & Price, 2013). The result of increased sensitivity is necessarily a reduction in specificity. Therefore, false-positive cases will lead to biased prevalence estimates and reductions in power for analytical estimates (Meyer et al. 203; Tierney et al., 2017). However, Frascati criteria are the most widely used criteria for diagnosing HAND in clinical settings and research. Direct validation of the criteria for ANI and MND rely on neuropsychological testing as there are no reliable longitudinal clinical-pathological correlation studies, nor a gold standard antemortem biomarker or imaging finding.

Our results confirm the main potential benefit of MoCA, is as a test promising to decrease the cognitive assessment time and costs. However, the optimal threshold for this tool should be lower than 26 (see Table 1 and Supplemental Table 2). Overall, we recommend against approaches that use MoCA in isolation. A possible solution could consist of a short battery of tests, including the MoCA, requiring 10 to 30 min to complete, which could enhance both sensitivity and specificity and could be used in settings with limited resources (Joska et al., 2016). Future studies could compare multiple brief screening tests with a full neuropsychological battery in order to optimize a screening tool that can reliably detect HAND. Additionally, further cross-sectional studies are required to examine the optimum cut-off score for HAND. In this regard, different thresholds should be tested in individuals with multiple cultural and educational backgrounds, and speaking different languages. Researchers should also consider the value of MoCA in a diagnostic workup so that clinicians understand how to use this screening test to attain relevant outcomes for patients, such as the benefits of earlier diagnostic and the harms of unnecessary testing.

The objective results from a screening test like MoCA are still likely to be more reliable than the information provided by patients or self-reports (De Francesco et al., 2016; Obermeit et al., 2017). Patients with abnormal screening results should be further assessed for the underlying causes of cognitive impairments such as mood disorders, cognition-impairing effects of ART, thyroid disease, syphilis and B12 deficiency. These abnormalities should be correctly identified before referring patients for a full neuropsychological assessment (Hakkers et al., 2017). A stepwise protocol including cognitive screening would be easy to implement in routine clinical practice showing physicians how to deal with this complex problem (Hakkers et al., 2017).

Our study has certain limitations. First, the relatively low number of the studies, where no more than 287 patients could be included in this analysis, requires particular caution when interpreting our results, especially in the case of the threshold score of 22. Second, there was significant heterogeneity among the studies with regard to demographic differences, language, cultural and educational background. Differences in cultural and educational experiences may result in lower performance on neuropsychological tests. Normative corrections (i.e., for age and gender, education, and ethnicity) are not always available for all populations of people living with HIV or they are based on a limited set of demographic factors. This can induce bias when evaluating cognitive impairment (Devlin & Giovannetti, 2017). For example, although MoCA was demonstrated to be influenced by age, educational level and cultural background, its norms have been published and stratified by age and educational level only in the following languages: English, Quebec- French, Italian, Portuguese, Japanese and Czech (Carson et al. 2018). In addition, other factors may introduce unrecognized biases, such as the total central nervous system penetration-effectiveness (CPE) score, polypharmacy, medication side effects, CD4 count and viral loads (Koenig et al., 2016). However, heterogeneity is reasonably assumed in diagnostic test accuracy studies and most approaches to pooling test accuracy data consider this aspect in the analysis.

In conclusion, despite limitations mentioned before, our meta-analysis represents the first systematic review of the literature published in this field and describes an accurate comparison between the MoCA thresholds in patients infected with HIV. The MoCA test appears to be a reasonable screening tool for HIV-infected patients, especially when our recommended threshold score of 23 is used, as it offers the best balance between true positive and false positive results (with a sensitivity of 0.44 and a specificity of 0.79). Nonetheless, our findings indicate that optimal threshold for MoCA always comes with a sensitivity-specificity trade-off, the preferred cut point depending on whether sensitivity or specificity is more valuable in a given context.