Introduction

Central nervous system (CNS) complications in HIV infection persist despite effective antiretroviral therapy (ART) and impact HIV care including ART adherence. One of the most common CNS complications is neurocognitive impairment (NCI). The prevalence of NCI ranges from 12 to 88% (median = 40.5%) with some of the highest rates being in low- and middle-income countries including Uganda (Rubin and Maki 2019). Diagnostic categories of HIV-associated neurocognitive disorder (HAND)(Antinori et al. 2007) are also prevalent with asymptomatic NCI (ANI) ranging 10 to 55% (median = 30.5%), mild neurocognitive disorder (MND) in 5 to 31% (median = 11%), and HIV associated dementia (HAD) in 0 to 31% (median = 3%)(Rubin and Maki 2019). Despite the persistence and severity of NCI in people with HIV (PWH), its underlying pathophysiology remains elusive.

There is ample evidence for the heterogeneous nature of NCI in PWH including the degree and pattern of NCI, yet few studies focus on revealing the inherent heterogeneity of these HIV-associated CNS complications. For diagnostic purposes in the neuroHIV field, NCI has been thought of as a unidimensional construct. Performance on a neuropsychological (NP) test battery traditionally has been used to determine the type and level of impairment in HAND (Antinori et al. 2007). However, based on HAND diagnostic criteria, two individuals can receive the same diagnosis with very different patterns of impairment on a NP test battery. Recent studies demonstrate that heterogeneity in NP test battery performance is the rule and not the exception when understanding more mild levels of NCI as opposed to dementia in PWH (Brouillette et al. 2016; Gomez et al. 2018; Molsberry et al. 2018; Rubin et al. 2017; Woods et al. 2005). For example, cross-sectional studies indicate distinct neurocognitive profiles that vary in the degree and pattern of performance particularly in memory (Gomez et al. 2018; Molsberry et al. 2018) and executive function (Gomez et al. 2018). Memory deficits, which are key for encoding, storing, and retrieving facts and events, may suggest alterations in hippocampal and prefrontal circuity (Eichenbaum 2017; Jin and Maren 2015), whereas executive function deficits depending on the type (e.g., behavioral inhibition) may indicate prefrontal (dorsolateral prefrontal cortex, inferior frontal gyrus), anterior cingulate, and posterior parietal cortex alterations (Nee et al. 2007). Thus, disentangling heterogeneity in NCI may greatly contribute to our understanding of underlying brain phenotypes and provide a starting point for understanding the underlying pathophysiology.

Here, we examined heterogeneity in neurocognitive change trajectories during a critical window—from newly diagnosed HIV infections prior to ART initiation to a 2-year follow-up visit approximately 1–2 years after ART initiation—in Rakai, Uganda, a low-income country in Sub-Saharan Africa, where ~ 1.5 million PWH reside. In contrast to the United States (US) cohort studies (e.g., Women’s Interagency HIV Study (WIHS)), this cohort had minimal HIV-associated non-AIDS conditions including diabetes, hypertension, hypercholesterolemia, obesity, hepatitis C co-infection, illicit substance use, and the use of non-ART medications. Thus, Rakai provides a unique location to disentangle heterogeneity in ART effects on NCI in PWH with minimal confounding factors. We aimed to identify homogenous subgroups demonstrating similar neurocognitive change trajectories during this critical time period. This approach may in part facilitate our understanding as to why some ART-initiation studies demonstrate neurocognitive benefits over time, whereas others demonstrate neurocognitive deterioration (Al-Khindi et al. 2011; Cysique et al. 2009; Mora-Peris et al. 2016; Winston et al. 2015; Winston et al. 2012; Zhuang et al. 2017). Given the high frequency of learning and memory impairment in PWH (Heaton et al. 2011; Rubin et al. 2017) and US cross-sectional analyses demonstrating different performance patterns on these two domains (Gomez et al. 2018; Molsberry et al. 2018), we expected that these two domains would be important in distinguishing subgroups. To test the biological plausibility of ART-related neurocognitive phenotyping, we explored subgroup differences on a panel of pre-ART CSF cytokines and neurodegenerative biomarkers.

Methods

Participants

Three hundred twelve PWH > 20 years old were enrolled from local Rakai Health Sciences Program (RHSP)-supported HIV clinics and the Rakai Community Cohort Study, an open, community-based cohort of participants residing in 40 communities in Rakai District, Uganda, which are representative of rural Uganda. Eligible participants were ART-naïve PWH with advanced immunosuppression (n = 147 CD4 < 200 cells/μL) or moderate immunosuppression (n = 165 CD4 350–500 cells/μL). Participants who initiated ART and returned for a 2-year follow-up were included in analyses. Exclusion criteria included severe neurocognitive (severe HAND) or psychiatric impairment precluding written consent, physical disability preventing travel to the RHSP clinic for study procedures, known CNS opportunistic infection or prior CNS disease.

Study procedures

Participants were enrolled between August 2013 and July 2015 and completed a sociodemographic and behavioral interview, mental health screen (Center for Epidemiologic Studies Depression Scale [CES-D]) (Radloff 1977), functional assessments (Patient Assessment of Own Functioning Inventory [PAOFI] (Richardson-Vejlgaard et al. 2009), Instrumental Activities of Daily living [IADL] (Fieo et al. 2010), Karnofsky functional status score (Karnofsky et al. 1948), Bolton Functional assessment (Bolton et al. 2004)), and a dementia screener (International HIV Dementia Scale [IHDS] (Sacktor et al. 2005)). Participants underwent a neuromedical evaluation and peripheral blood draw to confirm HIV status, CD4 cell count, HIV RNA, fasting glucose and lipids, and a metabolic panel. A subset of participants consented to receive an optional lumbar puncture at enrollment and CSF was used to measure 17 cytokines/chemokines via multiplex profiling with the Luminex Platform (Human 17-Plex Panel, Bio-Rad, Hercules, CA) and 20 biomarkers of neurodegenerative breakdown products (Milliplex Catalog: HMMP1-55 K-03; HMMP2-55 K-05; HNDG4MAG-36 K-05; HND1MAG-39 K-07; HNDG3MAG-36 K-10) (Abassi et al. 2017).

Standard protocol approvals, registrations, and patient consents

Participants gave written informed consent for study participation. The study was approved by the Western Institutional Review Board, the Uganda Virus Research Institute Research and Ethics Committee, and the Uganda National Council for Science and Technology.

NP outcomes

The Uganda NP test battery (Dugbartey et al. 2000; Klove 1963; Maj et al. 1994; Robertson 2006; Robertson et al. 2007; Sacktor et al. 2005; Smith 1982) was implemented for each participant pre-ART and re-administered 2 years later (~ 1–2 years post-ART initiation). The battery included eight tests commonly affected by HIV infection (Antinori et al. 2007; Heaton et al. 2010). Specific tests and outcomes included the following: (1) WHO/UCLA Auditory Verbal Learning Test [AVLT] (outcomes = total words recalled on trial 1, learning slope across trials 1–5, total learning across trials 1–5, short and long delay free recall, recognition); (2) digit span [DSPAN] forward and backward (outcomes = total correct on each); (3) Color Trails (outcomes = time to complete trial 1 and 2); (4) category fluency (outcome = total correct words generated); (5) Symbol Digit Modalities Test [SDMT] (outcome = total correct); (6) Grooved Pegboard [GPEG] (outcomes = time to complete dominant and non-dominant hands); (7) Finger Tapping [FTAP] (outcome = total taps); and (8) Timed Gait (outcome = average time to complete across 3 trials).

In the absence of published cognitive norms for low- and middle-income countries, we followed standard methods (Heaton et al. 1991; Maki et al. 2015; Rubin et al. 2017) to create demographically adjusted Z-scores for each individual outcome (16 outcomes) for the overall sample based on scores of the 400 HIV-uninfected age-, community-, and gender-matched adults living in Rakai District. Specifically, we did this by regressing biological sex, age, and years of education on each cognitive outcome. The resulting unstandardized beta weights, constants, and standard errors were used to compute predicted scores for each outcome that were then subtracted from each person’s actual score and transformed to Z-scores (using means of 0 and standard deviations of 1) that could be easily compared across all outcomes. Based on the pattern of correlations between the 16 outcomes among HIV-uninfected individuals (Supplemental Table 1) and PWH (Supplemental Table 2), we opted to treat the 16 outcomes separately rather than averaging them into cognitive domain scores.

Using the pre- and post-ART Z-scores, we then calculated reliable change index (RCI) for each outcome and participant (Busch et al. 2011) using the standard error of the difference formula from Jacobson and Truax (Jacobson and Truax 1991), and the test-retest reliability from PWH (Supplemental Table 3). The RCI is a standardized difference score which allows us to set a threshold for change we consider to be reliable, meaning not due to measurement error alone. An RCI cutoff equated to an effect size equal to 0.5 was considered clinically meaningful change (improvement or decline) (Brouillette et al. 2016; Norman et al. 2003).

Impairment on any outcome at the initial visit and follow-up time points was defined as > 1 SD. HAND stage was determined using the Frascati criteria (Antinori et al. 2007) consistent with our previous publications in Uganda (Abassi et al. 2017; Sacktor et al. 2009, 2013; Saylor et al. 2019). In brief, ANI was defined as > 1 SD on at least 2 different NP tests and absence of functional complaints; MND was defined as > 1 SD on at least 2 different NP tests and functional complaints; HAD was defined as > 2 SD on at least 2 different NP tests and functional complaints. Functional impairment as defined as Karnofsky score < 80, Bolton score > 0 on any question, and PAOFI score < 4 on the functional status component or an IADL current performance abnormality versus previous best performance.

Statistical analysis

All NP test outcomes (16) were used in a latent profile analysis (LPA) to identify homogenous subgroups of PWH with similar ART-related neurocognitive trajectories. LPA is a latent variable statistical technique used to identify a few mutually exclusive latent (unobservable) classes of individuals (Hagenaars and McCutcheon 2002). These classes are based on the individuals’ responses to a set of measured (observed) continuous variables-in this case performance on a NP test battery. Advantages of LPA include the following: (1) use of an objective approach (model-based, relies on probabilities, and model fit statistics), (2) consideration of all variables jointly (e.g., ART-related changes in all 16 outcomes simultaneously; Supplemental Table 4), (3) ability to capture complex patterns of variables, and (4) ability to handle a high degree of multicollinearity (Stanley et al. 2017). Four separate LPA solutions were examined and consisted of 2 to 4 clusters, respectively. To determine the best fitting model in terms of balancing fit and parsimony, we used a number of standard model fit statistics including Akaike’s information criterion (AIC; lower better model fit), the Bayesian information criterion (BIC; lower better model fit), entropy (higher suggests better class separation, ideally > 0.80), and interpretability of the models. After identifying the best fitting model, we extracted the posterior probability of an individual’s membership in each latent class and assigned them to the latent class in which their posterior probability of membership was highest. Group membership was then used as the primary predictor variable for all subsequent analyses to probe differences among classes. Analysis of variance (ANOVA) and chi-square tests were used for continuous and categorical socio-demographic, behavioral, and clinical variables respectively. A single generalized estimating equation (GEE) model was conducted to examine group membership differences in the pre-ART CSF biomarker panel. LCA analyses were done with Mplus statistical software (version 7.4). All other analyses were conducted in SAS version 9.4 (Cary, NC). Significance was set at p < 0.05; trends were noted at p > 0.05 and p < 0.10.

Results

Sample

Participants included 312 PWH (51% male) whose ages at enrollment (pre-ART) ranged from 20 to 68 (mean [M] = 35.6, standard deviation [SD] = 8.5; median = 35; interquartile range [IQR] = 12; 75% < 40 years; 94% < 50 years) (Table 1). Pre-ART, 47% (n = 147) had a CD4 count < 200, 53% (n = 165) between 350 and 500, and HIV RNA ranged from 0 to 6.6 log copies [cp]/ml (median = 4.6 log cp/ml, IQR = 1.2). Cardiovascular comorbidities were minimal, and participants did not report using common medications with known adverse neurocognitive effects (Radtke et al. 2018; Rubin et al. 2018) including antipsychotics, antidepressants, and anticholinergics. Fifteen percent smoked (n = 46), and narcotic use was rare (n = 7, 2.2%). Post-enrollment, the commonest ART regimen were efavirenz, lamivudine, and tenofovir disoproxil fumarate (n = 260, 83%). At 2-year follow-up, the median CD4 count was 394 (IQR = 254), and 85% (n = 262) had undetectable viral loads. The subset of participants who had consented to give CSF samples was similar to the overall subset of participants (P’s > 0.25; Supplemental Table 5).

Table 1 Socio-demographic, behavioral, and clinical characteristics in the overall sample and as a function of latent profile class pre-antiretrovirals

NP change profiles

The four-class solution provided the best fit to the observed data in terms of entropy (4-class = 0.834 vs. 3-class = 0.80 and 2-class =0.68), AIC (4-class = 12,844 vs. 3-class = 12,912 and 2-class =12,983), and BIC (4-class = 12,918 vs. 3-class = 13,279 and 2-class 13,227) and was clinically interpretable. To facilitate interpretation of each of the four subgroups, Fig. 1 illustrates the raw RCI data distribution (reflecting pre- to post-ART change) for each NP test outcome from the four-class LPA. Figure 2 provides the actual percent impairment for (A) each NP test outcome and (B) HAND by subgroup as a way of understanding the degree of impairment pre-ART initiation and the degree of change post-ART. Each of the four classes are examined separately. The first subgroup (decline-only, n = 48; 15%) demonstrated pre- to post-ART declines (> 0.5) in WHO/UCLA AVLT total learning, short and long delay recall, fluency, SDMT, and DSPAN forward and near clinically significant declines on color trials trial 2 and DSPAN backwards (RCI estimates = − 0.48)(Fig. 1). Performance remained relatively stable on WHO/UCLA AVLT recognition and on all motor tests (GPEG, FTAP, timed gait). The overall pattern of NP change (Fig. 2A) was reflected in a 21% increase in any HAND diagnosis from pre- to post-ART (44 to 65%) which was being driven by an increased number of individuals diagnosed with MND post-ART (22.9 to 48%)(Fig. 2B).

Fig. 1
figure 1

Raw data distribution for the reliable change index (RCI) for each neuropsychological test outcome for each of the four latent classes. AVLT, Auditory Verbal Learning Test; SDMT, Symbol Digit Modalities Test; Trails, Color Trails; DSPAN, Digit Span

Fig. 2
figure 2

Percent impairment pre- and post-antiretroviral (ART) for (A) neuropsychological (NP) outcomes (B) HIV-associated neurocognitive disorders by latent class. Green corresponds to improvement and magenta to decline; AVLT, Auditory Verbal Learning Test; SDMT, Symbol Digit Modalities Test; Trails, Color Trails; ANI, asymptomatic neurocognitive impairment; MND, mild neurocognitive disorder; HAD, HIV associated dementia

The second subgroup (mixed, n = 32, 10%) demonstrated domain-specific improvement in WHO/UCLA AVLT total learning and recognition yet declines in DSPAN forward and backward and color trails trial 2 (Fig. 1). The remaining outcomes remained relatively stable including all motor tests (GPEG, FTAP, timed gait) and WHO/UCLA AVLT short and long delay recall. The overall pattern of domain-specific increases and decreases (Fig. 2A) resulted in a similar rate of HAND pre- (46.9%) and post-ART (55.2%) (Fig. 2B).

The third subgroup (no-change, n = 169; 54%) demonstrated no significant changes in performance from pre- to post-ART as indicated by a RCI within + 0.5 across all outcomes (Fig. 1). Impairment was higher in SDMT, color trails (trials 1 and 2), and DSPAN forward and backward, and FTAP (Fig. 2A). Impairment was present but to a lesser extent on WHO/UCLA AVLT outcome measures (total learning, short and long delay recall, recognition). The prevalence of HAND remained high (57.4% pre- and 49% post-ART) (Fig. 2B).

The fourth subgroup (improvement-only, n = 63; 20%) demonstrated domain-specific improvements in color trails (trial 1), WHO/UCLA AVLT total learning and short and long delay recall, and recognition (Fig. 1). Performance on all other outcomes remained relatively stable (Fig. 1A) resulting in a 20% decrease in any HAND diagnosis from pre- to post-ART (66.7 to 46.7%) which was driven by a decreased number of individuals with HAD post-ART (25.4 to 6.5%) (Fig. 2B).

Factors that differed across the subgroups pre-ART

Table 1 provides socio-demographic, behavioral, and clinical characteristics as well as laboratory profiles (CD4 count, HIV RNA) by the four subgroups. Only four factors significantly differed across classes and included the proportion of individuals being underweight, using tobacco and Septrin, and IHDS scores. The improvement-only subgroup had a greater proportion of PWH who were underweight versus the decline-only and mixed subgroups (P’s < 0.05), a greater proportion of PWH using septrin versus the no-change subgroup (P = 0.005), and lower IHDS scores versus the decline-only subgroup (P’s < 0.05). Tobacco use was higher in the decline-only subgroup versus the other subgroups (P’s < 0.05). A trend toward a higher proportion of HIV subtype D infections within the improvement-only subgroup (87%) versus the other subgroups (~ 50%) (P = 0.078). There was also a trend for the improvement-only subgroup to have a higher percentage of PWH with HAND pre-ART (67%) versus the decline-only (44%), mixed (47%), and no-change (57%) subgroups (P = 0.07).

Factors that differed across the subgroups from pre- to post-ART or post-ART

Table 2 provides behavioral and clinical characteristics as a function of the four latent class groups from pre- to post-ART. There were no differences in ART medication regimen, CNS penetration effectiveness, or monocyte efficacy scores of the drug combinations (P’s > 0.24). The only factor differing across subgroups was the change in IHDS scores. The decline-only subgroup showed the least improvement in IHDS scores versus the improvement-only subgroup (P = 0.04).

Table 2 Behavioral, and clinical characteristics as a function of latent profile class from pre- to post-antiretrovirals (ART) or post-ART

Pre-ART CSF biomarker levels differ across subgroups

Following a false discovery rate (FDR) correction (Benjamini-Hochberg procedure), significant group differences emerged on 8 biomarkers when comparing the decline-only, mixed, and improvement-only subgroups versus the no-cognitive change subgroup (Fig. 3). No biomarker level differences between the decline-only and no-change subgroup survived FDR correction. The mixed subgroup versus the no-change subgroup demonstrated lower levels on a number of biomarkers after FDR correction including interleukin (IL)-1β, IL-6, IL-13, interferon (INF)-γ, macrophage inflammatory protein (MIP)-1β (CCL4), and matrix metalloproteinase (MMP)-3 (P’s < 0.01). Finally, the improvement-only subgroup versus the no-change subgroup demonstrated lower levels of MMP-10 and platelet-derived growth factor (PDGF)-AA (P’s < 0.01) after FDR correction.

Fig. 3
figure 3

Pre-antiretroviral CSF estimated mean (standard error) biomarker levels (log transformed and z-scored) showing significant differences between the decline-only subgroup, the mixed subgroup, and the improvement-only subgroup compared to the no-change subgroup. Shaded regions meet statistical significance following a false discovery rate correction

Discussion

The primary finding was evidence of heterogeneity of neurocognitive change trajectories in people starting ART in Rakai, Uganda. The heterogeneity cannot be accounted for by HIV-associated non-AIDS conditions, co-infections, illicit substance use, or non-ART medications with known adverse neurocognitive effects, which are common among large cohorts in the US (e.g., Women’s Interagency HIV Study, Muticenter AIDS Cohort Study) but uncommon in this cohort in rural sub-Saharan Africa. Notably, patterns of domain-specific neurocognitive change during this critical window demonstrate that ART-treatment does not have the same effects for all individuals. This is not surprising given that any medication can be effective in some patients, ineffective in others, and even harmful in others. Consideration of heterogeneity and taking an agnostic approach to the neurocognitive outcomes may in part explain why some ART-initiation studies demonstrate neurocognitive benefits over time whereas other demonstrate neurocognitive deterioration (Al-Khindi et al. 2011; Cysique et al. 2009; Mora-Peris et al. 2016; Winston et al. 2015; Winston et al. 2012; Zhuang et al. 2017).

Numerous factors could be responsible for heterogeneity in cognitive response to ART-initiation including host, virus, and even environmental factors. Some factors may include background ART resistance in those individuals not naïve or a resistant virus transmitted in those that are naïve. Clade A or D or some combination may not be as responsive to the ART regimen. Although missing statistical significance in the present study, there was a higher proportion of people with subtype D in the improvement-only group, a group that had higher levels of pre-ART NCI. Additional factors include poor ART adherence leading to incomplete suppression and resistance and host systemic factors such as altered genetic metabolism of ART leading to fast clearance and sub-therapeutic ART levels. Environmental factors may also be involved including poor nutrition that may alter ART metabolism or storage temperature levels that alters drug efficacy. In the present study, being underweight was one factor that differed between the subgroups with those in the improvement-only subgroup having a greater number of individuals being underweight (16%), followed by the mixed subgroup (9%) and the decline-only (2%) and no-change (0%) subgroup.

In the present study, 85% of participants were on efavirenz and interindividual variations in the pharmacokinetics of efavirenz (Burger et al. 2006; Dhoro et al. 2015) could be responsible for some differences in CNS-related side effects including neurocognition. Single nucleotide polymorphisms (SNPs) in CYP2B6 is particularly relevant (Gallien et al. 2017; Gross et al. 2017; Swart et al. 2013), with the most significant allelic variant of CYP2B6 being higher in Africans (Wang et al. 2006). Efavirenz discontinuation because of reported CNS symptoms is more likely in individuals with higher versus lower genetic risk scores for slow efavirenz metabolism (via loss or decrease of function SNPs in CYP2B6, CYP2A6, and CYP3A4) (Cummins et al. 2015). Slow efavirenz metabolism is associated with increased plasma concentrations, and those with higher plasma concentrations demonstrated higher efavirenz-related CNS side effects (Marzolini et al. 2001). Thus, safety and efficacy of medications, including ART, are likely impacted by a number of factors including genetics (e.g., intestinal and hepatic CYP450 enzymes which impact drug metabolism (Zanger and Schwab 2013)), drug-drug interactions, biological sex, age, and food intake (impacts drug bioavailability).

Of particular clinical relevance were that some ART-naive PWH treated with ART showed domain-specific improvement whereas others showed domain-specific declines, a combination of improvement and decline, or no neurocognitive changes. NP tests that appeared most malleable to ART treatment were ones assessing learning, memory, and attention (seen in 3 out of 4 classes) followed by executive function (seen in 2 classes for each domain). Tests measuring speed of information processing (SDMT) and fluency only changed in one subgroup. These domains were not the domains that were most impaired at baseline. Interestingly, the tests assessing learning, memory, and attention either improved or declined with ART whereas tests assessing executive function only declined with ART. In converse, ART-related changes in tests measuring motor skills were resistant to improvement or decline in any subgroup.

Another point of clinical relevance relates to the pattern of ART-related changes in learning and memory. The decline-only subgroup demonstrated a pattern of decline as evidenced by memory retrieval deficits but relatively normal memory storage and retention (impaired learning and recall but intact recognition), the typical pattern of NCI in the ART era (Milanini et al. 2016; Scott et al. 2011). The mixed subgroup only demonstrated learning and recognition improvements (not retrieval) with ART. Overall learning improvements were not due to the rate of learning across trials. Finally, the improvement-only subgroup demonstrated a pattern of improvement as evidenced by improvements in encoding, storage, and retrieval.

Gait speed, gross motor function (e.g., FTAP), as well as fine motor skills (GPEG) were generally resistant to change with ART using the RCI in the present cohort. It is unlikely that the lack of change in these outcome measures is due to low test-retest reliability of these outcomes. The test-retest reliabilities for motor and non-motor outcome measures were comparable suggesting that the lack of motor changes were not differentially compromised by this measurement issue compared with other outcome measures. Similar findings were recently reported in a separate Ugandan cohort in which neurocognitive function improved after ART initiation in all assessments except timed gait, which worsened, again suggesting motor impairment may be less malleable to improvement on ART (Spies et al. 2018). In healthy individuals, functional magnetic resonance imaging (fMRI) studies demonstrate that bilateral pre- and post-central gyri, the thalamus, putamen, and cerebellum are invoked during finger tapping tasks (Lehericy et al. 2006; Wesley and Bickel 2014). Among PWH, fMRI and structural MRI demonstrate HIV-related alterations in these same, often subcortical, brain regions (Paul et al. 2002; Plessis et al. 2014; Sanford et al. 2017; Tucker et al. 2004; von Giesen et al. 2000; Wright et al. 2016; Zhou et al. 2017) which harbors viral reservoirs in the simian immunodeficiency virus (SIV)/macaque model for HIV (Avalos et al. 2017; Gama et al. 2017). Notably, volume loss in subcortical regions such as the putamen is associated with poorer behavioral performance on finger-tapping and timed gait tests early during infection,(Wright et al. 2016) and these volumetric reductions persist with ART (Ances et al. 2012). Thus, initial damage including dopaminergic dysfunction (Lee et al. 2014) caused by HIV in these motor-related regions (Sailasuta et al. 2012; Valcour et al. 2012) may not normalize after viral suppression with ART. This may in part be due to ongoing low levels of neuroinflammation particularly in pathways mediated by cells of the monocyte/macrophage lineage (Anderson et al. 2002; Burdo et al. 2013a; Gonzalez-Scarano and Martin-Garcia 2005; Hong and Banks 2015; Kaul et al. 2001; Lawrence and Major 2002; Lindl et al. 2010; Valcour et al. 2010). For example, studies demonstrate that neurocognitive vulnerabilities including motor skills in virally suppressed PWH are associated with microglial activation in vivo (Coughlin et al. 2014; Garvey et al. 2014) and soluble markers of monocyte-driven inflammatory markers (e.g. sCD163, sCD14)(Burdo et al. 2013b; Imp et al. 2017; Royal 3rd et al. 2016). Moreover, these monocyte-driven inflammatory markers are increased during acute infection, normalize after ART, and correlate with motor function (D’Antoni et al. 2018).

Neuropsychological outcomes measuring learning and memory, attention, and executive function were more likely to change with ART compared with motor skills. One possibility is that these higher-order cognitive processes may be reliant on brain structures that have a greater ability to develop compensatory networks. The WHO/UCLA AVLT is a behavioral measure of the declarative memory system (reliant on extrinsic and intrinsic hippocampal circuitry including connections with the prefrontal and posterior parietal cortex (Insel et al. 2010; Woody and Gibb 2015)) whereas digit span forward and backward and color trails trial 2 are cognitive control system (Votruba and Langenecker 2013) correlates (regulated by the anterior cingulate cortex and dorsolateral prefrontal cortex (Insel et al. 2010; Woody and Gibb 2015)). Mathematical modeling of the human brain connectome (Ye et al. 2015) provides some evidence to suggest that the brain networks subserved by these cognitive systems may be able to circumvent the initially damaged brain structures (e.g., basal ganglia circuitry (Haber 2016)) by HIV (Valcour et al. 2012; Valcour et al. 2015) and develop compensatory networks. Alternatively, it is equally likely that with only two-time points, the dynamic improvements on select tests reflects differential sensitivity to practice effects or sensitivity to change.

Albeit small sample sizes, a number of cytokines and neurodegenerative markers measured pre-ART differed between the neurocognitive trajectory change phenotypes. The biomarkers distinguishing subgroups showing domain specific change (improvement-only, decline-only, or mixed) compared with the no-change subgroup included IL-1β, IL-6, IL-13, IFN-γ, MIP-1β (CCL4), MMP-3, MMP-10, and PDGF-AA. Thus, cytokines (IL-6, IL-13, IL-1β, IFN-γ) and chemokines (MIP-1β), markers relating to blood-brain barrier permeability (MMP-3, MMP-10), and a marker related to wound repair (PDGF-AA) distinguished the groups. Subgroup-specific patterns provide preliminary evidence that disentangling the heterogeneity in ART-related neurocognitive changes in PWH may provide insight into the underlying pathophysiology.

Although the study had a number of methodological strengths (e.g., longitudinal design, comprehensive NP test battery, minimal confounding of cardiovascular risk factors, illicit substance use, non-ART medications), there were also a number of limitations. Although we had an appropriate control group (HIV-uninfected adults from the Rakai district), NP data was only available for this group at a single time point. Thus, we were limited in our ability to handle “normal” neurocognitive change. We opted to create demographically adjusted norms based on the control data at a single time followed by RCI calculations using test-retest reliabilities from PWH. While the RCI method applied here surpasses the simple difference score method (which does not account for test-retest reliability), our method also did not account for practice effects which cognitive measures often show with repeated testing to varying degrees (Beglinger et al. 2005; Benedict and Zgaljardic 1998; Wagner et al. 2011). While RCI methods adjusting for practice have been developed (Chelune et al. 1993; Sawrie et al. 1996) and outperform the simple RCI method (Temkin et al. 1999), the adjustment is based on the mean practice effect for the normative group which was not available for this study. Also, practice effects would bias the scores toward positive change, which means that the declining group would be very reliable. It is important to note that there was a long duration between testing sessions (2 years) in the present study. The practice effect over this time interval is minimal (Lamar et al. 2003) compared with frequent testing over weeks or months. To properly correct for practice effect and multiple confounding factors simultaneously, future studies are needed where the control group completes NP tests at two or more assessments (e.g., practice adjusted RCI (Busch et al. 2015; Heaton et al. 2001), standard regression-based [SRB] approach (Cysique et al. 2011); see for review (Duff 2012)). Another limitation was the test-retest reliabilities of each NP outcome which are expected to be lower than those that would result from the creation of cognitive domain scores. Test-retest reliabilities are incorporated into the RCI computation which is an important variable that facilitates reliable trajectory predictions. However, we felt the advantage of using the 16 test outcomes in our LPA versus creating cognitive domain scores outweighed the potentially lower test-retest measurement. Our decision was supported by the pattern of correlations between the 16 outcome measures in both the HIV-uninfected controls and PWH which did not align with how the NP outcomes are typically grouped into domain scores (see Supplemental Table 3). Third, we acknowledge that ART-related neurocognitive changes are often characterized by a fluctuating pattern of change rather than decline over time (Mora-Peris et al. 2016). Unfortunately, only two-time points of NP data were available. Fourth, several psychosocial determinants were not assessed and duration of HIV infection was not available in the current study that could contribute to the heterogeneity in neurocognitive change trajectories. For example, prevalence of domestic violence (Koenig et al. 2003), coercive sex (Koenig et al. 2004; Pilgrim et al. 2013), intimate partner violence (Wagman et al. 2016; Zablotska et al. 2009), divorce (Nalugoda et al. 2014; Porter et al. 2004; Wagman et al. 2016), abandonment (Mullinax et al. 2013), and stigma (Nakigozi et al. 2013) are common in Rakai and are known risk factors of NCI. With respect to HIV disease duration, while this information was not available as many participants were sourced from different facilities outside of RHSP, we do know that the guidelines for ART initiation changed (initiate ART when CD4 < 500) as our study began enrollment. As a result, the vast majority of PWH were most likely enrolled at the time of diagnosis. Fifth, although there was an equal proportion of males and females in each of the latent profiles, additional work with larger samples is warranted to determine if the impact of ART on neurocognitive change is the same for males and females. Sixth, only a small number of individuals in the overall analyses had pre-ART CSF levels to examine differences in biomarkers by subgroups; however, this small group of individuals was comparable with the larger group of PWH without CSF. Replication and larger samples are warranted.

In summary, there is considerable heterogeneity in neurocognitive change trajectories in people starting ART in Rakai, Uganda. Identifying cognitive change phenotypes in relation to ART may hold promise for a better understanding of pathophysiology and possibly treatment implications. Future studies would benefit from assessing ART levels (particularly efavirenz) in biospecimens (blood or CSF) in order to more directly examine ART-induced CNS symptoms that could be related to ART penetrance and availability.