Introduction

Precision medicine is increasingly central to healthcare (Insel, 2014; Morere, 2012), though it is far from the standard of care in psychiatry where predictive assessments are rarely used as part of routine care. Neuroimaging is an attractive technology for identifying pre-clinical prognostic mechanisms, which have potential for use in routine clinics, yet its use in this capacity is tempered with empirical considerations such as variability in response prediction (Fonseka et al., 2018), data collection, and analytic methods (Langenecker et al., 2018). The current study examined whether the literature, taken together, contains clinical-translation-ready fMRI predictors, specific to common intervention types for major depressive disorder (MDD). MDD exhibits a 40–60% response rate (Hollon et al., 2002), and neuroimaging studies have pointed to potential neural prognostic indicators of treatment outcome in MDD (Fonseka et al., 2018; Fu et al., 2013; Harmer, 2014; Marwood et al., 2018). Prior meta-analyses have generally not distinguished between treatment types, or have examined change-based (pre-post) neural signatures instead of baseline neural predictors (Nord et al., 2021), limiting their clinical applicability for precision medicine.

There is substantial variability among findings of previous prognostic neuroimaging studies (Fonseka et al., 2018); multiple candidate brain networks comprising regions including the amygdala, prefrontal cortex, insula, and anterior cingulate cortex (ACC), have been implicated through functional magnetic resonance imaging (fMRI) meta-analyses of emotion reactivity (Hamilton et al., 2012; Janiri et al., 2019; Makovac et al., 2020) and regulation (Picó-Pérez et al., 2017) as well as depression and depression treatment outcome (Fonseka et al., 2018; Fu et al., 2013; Langenecker et al., 2018; Pizzagalli, 2011; Sankar et al., 2018; Sundermann et al., 2014). Treatment may help address functional abnormalities, with CBT increasing regulatory control of emotions and SSRIs targeting emotional reactivity more directly (DeRubeis et al., 2008). In light of different treatment modalities showing direct effects on different neural processes, it follows that neural reactivity that is prognostic of treatment response may also differ between treatment modalities.

Of the multiple regions described in the literature, the perigenual and subgenual regions of the ACC (pgACC and sgACC) are often cited and as such, served as our a priori candidates. Both regions have been associated with emotion monitoring and regulation (DeRubeis et al., 2008) as well as treatment outcome in depression (Fu et al., 2013; Pizzagalli, 2011). Some psychotherapy studies suggest that lower pre-treatment sgACC reactivity to negative information is prognostic of more symptom improvement following CBT (Siegle et al., 2006, 2012). These results do not appear to be due to therapy targeting a deficit in cingulate reactivity, as participants who experienced normalized cingulate reactivity did not recover (Siegle et al., 2012). Rather, decreased sgACC reactivity appears to serve as a treatment-facilitating mechanism. If this region is involved in monitoring the amygdala, perhaps decreased monitoring allows increased focus on regulatory control and distraction.

In contrast, greater pre-treatment pgACC and sgACC reactivity are implicated in better antidepressant medication outcomes, both in response to negative emotional stimuli (Chen et al., 2007; Godlewska et al., 2018; Roy et al., 2010; Victor et al., 2013) and at rest (Mayberg et al., 1997; Pizzagalli et al., 2001). Resting state functional connectivity data suggests that sgACC function is more strongly associated with response to CBT compared to SSRI (Dunlop et al., 2017). To the extent that cingulate reactivity is treatment modality specific, these data could suggest that individuals with the most adaptive pre-treatment, stimulus-related recruitment of brain mechanisms that proximally regulate limbic regions, particularly the amygdala, also respond better to SSRIs. These data also suggest that different treatment modalities may be associated with either different effect directions and/or unique brain regions, yet prior meta-analyses have not assessed for potentially unique effects by treatment modality. Thus, the current study revisited the question of whether predictive brain regions differ by treatment modality.

Current study

The current study had two principal aims geared toward evaluating whether the current neuroimaging literature is ready for clinical translation. Aim 1 was to characterize the extent to which hypothesized regions of the brain (sgACC and pgACC) are prognostic indicators of acute response to treatment (CBT or SSRI) in MDD literature, as determined by meta-analysis (activation likelihood estimation). Aim 2 was to assess whether the meta-analytic identified regions would predict treatment response in a specific verification sample of individuals with MDD (N = 79) who completed treatment (CBT or an SSRI) as part of prior clinical trials.

For aim 1, the hypothesis was that meta-analyses would result in ACC subregions (e.g., sgACC and pgACC) as well as potential limbic regions (e.g., amygdala). In light of prior work that has established greater pgACC is prognostic of treatment response in SSRI samples and less sgACC activity is prognostic of treatment response to CBT, this was the specific hypothesis for aim 2. The broader hypothesis of aim 2 is that neural reactivity to negative emotional stimuli within meta-analytically derived regions would predict treatment response to CBT and SSRI in verification samples. Our contention is that a positive result for Aim 1 would suggest there is theoretical utility of these regions. Strong prediction in the verification sample would suggest the derived regions are optimal for clinical translation; without that verification, we suggest that the field needs more methodological work before trying to apply findings in real-world assessments.

Methods

Study design

Meta-analytic methodology

To obtain studies for the meta-analysis, we reviewed all articles included in recent reviews of the neuroimaging depression treatment response prediction literature (Cohen et al., 2021; Fonseka et al., 2018; Fu et al., 2013; Kang & Cho 2020; Langenecker et al., 2018; Pizzagalli, 2011). Articles were included if they reported pre-treatment neural activation (fMRI) as prognostic or predictive of treatment outcome in depression. The meta-analysis was limited to fMRI studies that assessed treatment response to an SSRI or CBT (including variants such as behavioral activation), provided coordinates for their region results (in initial articles or during email exchange with authors), and used emotion-related tasks (affective stimuli and reward-based paradigms) that reported activation (not exclusively connectivity) consistent with the study’s theoretical framework of emotion regulation processes. We also conducted supplemental literature searches in November 2021 via PubMed and Google Scholar using relevant terms, “depression,” “neural predictors,” “treatment,” and “cognitive behavioral therapy” to assess the comprehensiveness of the systematic reviews from which primary articles were gathered. The literature searches did not result in any additional articles. We included one article (Young et al., 2020) that was not in the reviews, as it was published after the articles’ literature searches. We were aware of the article because authors of the current manuscript had written it. Table 1 lists included articles; Supplement 1 lists all reviewed articles that were excluded and reasons for their exclusion.

Table 1 List of papers included in meta-analyses

From the included articles, all coordinates were submitted to GingerALE (version 3.0.2), a software package that conducts activation likelihood estimation (ALE) using coordinate-based data. The meta-analysis was conducted separately for CBT and SSRI studies. If a study reported a region that was predictive of response to both treatments (1 study), coordinates were included for both. For both meta-analyses, cluster-level familywise error was set to 0.05 and voxelwise p-value threshold to 0.005. This is consistent with prior work (Carter et al., 2016) suggesting that a voxel threshold of p < 0.001 may be too conservative for neuroimaging studies in clinical samples. Threshold permutations were set to 1,000, with GingerALE’s 2 mm mask size parameter.

Verification sample

We tested the predictive value of meta-analytic regions in a verification sample of clinical trial data consisting of participants with MDD who either underwent CBT or SSRI. The verification sample consisted of 79 individuals with MDD recruited in an expanded sample from a prior fMRI CBT outcome prediction study (Siegle et al., 2012) and a parallel preference-based SSRI fMRI study (Young et al., 2020). The verification sample includes only the subset participants who completed treatment (CBT n = 60, SSRI n = 19) from the larger sample (N = 96, Consort in Supplement 2), consistent with previous work (Siegle et al., 2012). The CBT sample is larger than the original publication (Siegle et al., 2012), which was included in the meta-analysis, as additional data were collected after the initial article submission, and missing clinical response variables were imputed. Participants were thus from two clinical trials (ClinicalTrials.gov: NCT00183664; NCT00787501) and were treated with either an SSRI prescribed by a psychiatrist or CBT by 6 community clinicians (Ph.D.’s, M.D.’s, M. Ed.’s, LCSW’s) who ranged widely in CBT experience (described fully in (Siegle et al., 2012). Participants described no health problems, eye problems, or psychoactive drug abuse in the past six months and no history of psychosis, manic, or hypomanic episodes. Participants had not used antidepressants within two weeks of the baseline fMRI assessment (six weeks for fluoxetine) due to either medication naivety or supervised withdrawal from unsuccessful medication treatment. Participants reported no excessive use of alcohol in the two weeks prior to testing and scored in the normal range on a cognitive screen (Owen, 1992), VIQ-equivalent > 85.

Protocol and treatment procedures

After IRB-approved written, informed consent, we conducted diagnostic interviews (Structured Clinical Interview for DSM-IV, SCID; (First et al., 1995) along with a vision test and an unrelated physiological assessment. On a separate day, participants underwent a battery of fMRI tasks (counterbalanced order) fully described in (Siegle et al., 2012). Participants then received CBT or SSRI treatment. The first MDD participants (n = 16) were part of a CBT trial (Jarrett et al., 2013). Subsequent participants selected CBT (n = 56) or medication (n = 24) by preference. CBT and medication were given at the same visit frequency; 2 sessions/week for the first four weeks followed by 8 weekly sessions for “early-responders” (HRSD reduction < 40% at session 9; 16 total sessions) or 2 sessions/week for the first 8 weeks followed by 4 weekly sessions for non-early-responders (20 total sessions). CBT followed Beck’s (Beck et al., 1979) guidelines as described in (Siegle et al., 2012). Pharmacotherapy sessions were 30–45 min in length and were conducted by a psychiatric nurse who inquired about general mood status, did a Hamilton Rating Scale for Depression (HRSD) (M. Hamilton, 1960) assessment, and provided psychoeducation about medication effects and adverse effects. A psychiatrist consulted with the nurse and participant for the final 5–10 min of the session. The medication protocol is described in a prior publication (Young et al., 2020).

Clinical response

The primary outcome measure was the Beck Depression Inventory (BDI; Beck et al., 1996), a widely-employed 21-item self-report measure of depression, assessing symptom severity on a 0 (not present) to 3 (severe) scale, with strong psychometric properties(Dozois et al., 1998; Wang & Gorenstein, 2013). Consistent with prior work (Siegle et al., 2012), improvement was considered both as BDI change (post-pre) as well as residual severity (BDI), calculated as final severity controlling for initial severity. Six participants were missing pre-treatment BDI scores, and 13 participants were missing post-treatment BDI scores. Of these, 6 pre BDI scores and 11 post BDI scores were missing due to licensing issues that resulted in the BDI not being able to be collected in the interim. Reasons that the remaining 2 post BDI scores were missing were not noted. For imputation of missing data, related measures including the HRSD (M. Hamilton, 1960), State-Trait Anxiety Inventory (Spielberger, 2010), the General Distress Depressive Symptoms and Anhedonic Depression subscales of the Mood and Anxiety Symptom Questionnaire (Watson et al., 1995), and the Rumination subscale of the Response Styles Questionnaire (Nolen-Hoeksema, 1991) were submitted to the SPSS multiple imputation procedure (averaging across 5 imputations).

fMRI task used in the verification sample

Apparatus

Twenty-nine 3.2 mm slices were acquired parallel to the AC-PC line (3T Siemens Trio, T2*-weighted images depicting BOLD contrast; posterior-to-anterior, TR = 1500ms, TE = 27ms, FOV = 24 cm, flip = 80), yielding 8 whole-brain images per 12 s trial. Stimuli were displayed in black on a white background via a back-projection screen (.88o visual angle). Responses were recorded using a Psychology Software Tools™ glove.

Personal relevance rating task (PRRT)

As described fully in a previous publication on this sample (Siegle et al., 2012), in 60 slow-event related trials, participants viewed a fixation cue (1 s; row of X’s with prongs around the center X) followed by a positive, negative, or neutral word (200 ms; only negative words analyzed in the present study), followed by a mask (row of X’s; 10.8 s). Participants pushed a button for whether the word was relevant, somewhat relevant, or not relevant to them or their lives (button orders balanced across participants), as quickly and accurately as they could. Participant-generated and normed words were used, consistent with previous studies (Siegle et al., 2006).

fMRI data preparation

The standard preprocessing is described fully in (Siegle et al., 2012) (i.e., slice time correction, motion correction, linear detrending, voxelwise outlier rescaling, conversion to percent-change, temporal smoothing [5 point middle peaked filter], 32 parameter nonlinear warping the Montreal Neurological Institute Colin-27 brain, and spatial smoothing [6 mm FWHM]), response time-series variability normalization across scanners. Reactivity to negative words was calculated as the mean of the 4th-7th images (“scans”) of each negative-word trial minus the trial’s first (pre-stimulus) scan acquired while the fixation cue was on the screen, i.e., 6-10.5 s after stimulus onset, or 5-9.5 s following the onset of the word, consistent with previous work (Siegle et al., 2006, 2012). To assess neural reactivity outliers, box-and-whisker plots were conducted for neural reactivity of meta-analytic regions, and each value’s position relative to the interquartile range (IQR) was reviewed. This resulted in the removal of one data point from the regressions using the CBT studies region, as the data point was > 3 x interquartile range. Additional reactivity data points outside of the 1.5 x interquartile range were winsorized, or rescaled to the maximum value in the same direction.

Verification sample statistical analyses

For aim 2, analyses involved examination, via regressions, of whether activity averaged over the gray matter masked (details in Supplement 3), meta-analytically derived regions predicted treatment response in the CBT and SSRI verification samples (i.e., whether the meta-analytic regions could serve as pre-clinical markers). We limited regressions to a priori candidate brain regions, subregions of the ACC. We considered ACC subregions optimal for verification testing, as they have been most consistently observed by prior reviews and meta-analyses to be prognostic of treatment outcome (Fonseka et al. 2018; Fu et al. 2013; Langenecker et al. 2018; Pizzagalli 2011). Moreover, the prior publication (Siegle et al., 2012) that includes a subset of the current sample reported an ACC subregion (sgACC) was prognostic of treatment outcome, thus examining ACC prediction in this expanded sample provides a more direct test of replication. Failure to replicate the prior finding in the current sample would provide particularly concerning evidence against readiness for clinical translation.

Power analyses

We conducted power analyses to estimate whether the verification sample size is sufficient to test anticipated effects. Effect size estimates for associated regions from prior papers using this task (Siegle et al., 2006, 2012) suggest effects would be large, ranging from R2 of 0.29 (Siegle et al., 2012) to R2 of 0.65 (Siegle et al., 2006). With alpha = 0.05, power b = 0.8, G*Power sample size estimates ranged from 7 (R2 = 0.65) to 22 (R2 = 0.29) participants needed. G*Power estimates suggest the CBT sample (n = 60) is sufficient to test anticipated effects, but the SSRI sample (n = 19) was sufficient to detect effects of R2 = 0.32 with b = 0.8 with effects as small as R2 = 0.25 detected with b = 0.66. The small sample estimates are optimistic, as studies contributing to the meta-analytic regions comprise diverse tasks, protocols, and samples, which could negatively affect generalizability to study data.

Results

Aim 1. Meta-analytic identification of brain regions prognostic of treatment outcome in depression

Consistent with study hypotheses and prior work in the area, an ALE meta-analysis of 11 studies involving CBT and related treatments resulted in 2 significant clusters (Fig. 1a). One cluster was in the right sgACC (size = 1,916 mm3, (MNI centroid coordinates are used throughout this manuscript) x = 7, y = 23, z = -12) and involved activation shared by three studies. After applying a gray matter mask, this became a 1,132 mm3 cluster with the same centroid coordinates, see Fig. 1b. The sgACC cluster derived from the meta-analysis of CBT neuroimaging studies is referred to throughout the rest of this article as “meta sgACC.” The other cluster involved the right amygdala / right parahippocampal gyrus (2,833 mm3, x = 23, y = -2, z = -20; with gray matter masking: 2,105 mm3, x = 24, y = -2, z = -20) and corresponded to shared activation of five studies.

Fig. 1
figure 1

ALE meta-analysis for CBT studies in depression. ((below figure) Note. 1a. Clusters from the activation likelihood estimation (ALE) meta-analysis of Cognitive Behavioral Therapy (CBT) studies in depression 1b. Resulting subgenual cingulate region after gray matter mask of ALE meta-analysis clusters)

The meta-analysis for SSRI treatment studies included 10 articles and resulted in 2 significant clusters (Fig. 2a) including the right anterior cingulate (4,792 mm3, x = 18, y = 32 z = 0) from six studies and right caudate (1,652 mm3, x = 20, y = 12, z = 18) from two studies. After applying a gray matter mask to these clusters, we applied a cluster threshold of 20 voxels (to limit reporting on small clusters created from applying the gray matter mask). This yielded 4 clusters, with one cluster spanning the right pgACC, (350 mm3, x = 15, y = 41, z = 0, Fig. 2b) and others in the right caudate and right middle frontal regions (Supplement 4). The pgACC cluster, derived from the meta-analysis of SSRI neuroimaging studies, is referred to throughout the rest of the article as “meta pgACC.”

Fig. 2
figure 2

ALE meta-analysis for SSRI studies in depression. ((below figure) Note. 2a. Clusters from the activation likelihood estimation (ALE) meta-analysis of selective serotonin reuptake inhibitor (SSRI) studies in depression 2b. Resulting perigenual cingulate region after gray matter mask of ALE meta-analysis clusters)

Aim 2. Application of meta-analytically derived regions to a verification sample

Demographics of the clinical sample

Treatment (CBT and SSRI) groups did not differ on gender, age, ethnicity, number of depressive episodes, or depressive symptoms (BDI). The CBT sample reported more years of education than the SSRI sample, t(75) = 2.96, p = 0.004, Hedges’ g = 0.77 (Supplement 5). Similar demographics were observed in the entire sample (not limited to those who completed treatment, Supplement 6).

Meta sgACC

For participants who received CBT, neural reactivity of the meta sgACC to negative words was not prognostic of BDI change scores (R2 = 0.06, F(1,58) = 3.76, p = 0.057) or residuals (R2 = 0.05, F(1,58) = 3.13, p = 0.082). In the SSRI sample, the meta sgACC was not prognostic of BDI change scores or residual in the SSRI sample (R2 < 0.01, p > 0.9), Fig. 3a.

Fig. 3
figure 3

Relationship between neural reactivity of meta-analytic anterior cingulate cortex subregions and treatment outcome. ((below figure) Note. 3a. Residual Beck Depression Inventory (BDI; Beck et al., 1996), regressed on meta subgenual anterior cingulate cortex (sgACC) neural reactivity in the Cognitive Behavioral Therapy (CBT) and selective serotonin reuptake inhibitor (SSRI) samples. 3b. Residual BDI regressed on meta perigenual anterior cingulate cortex (pgACC) neural reactivity in the CBT and SSRI samples)

Meta pgACC

In the SSRI sample, neural reactivity of the meta pgACC to negative words was prognostic of BDI residuals R2 = 0.25, F(1,18) = 5.60, p = 0.030 but failed to meet statistical significance for BDI change scores R2 = 0.13, F(1,18) = 2.46, p = 0.136. In contrast to effects observed in the literature, higher reactivity to negative words was associated with higher residual symptomatology (Fig. 3b). The neural reactivity of the meta pgACC was not associated with treatment response for CBT participants (R2 < 0.01).

Discussion

The aims for the study were (1) to examine via meta-analysis whether ACC subregions were observed in the literature to predict treatment (CBT or SSRI) response in MDD and (2) to examine whether neural reactivity of meta-analytic regions predicted response to CBT or SSRI in a verification sample. The meta-analysis revealed that in the literature, ACC subregions were prognostic indicators of treatment for depression (as well as the right caudate and amygdala). Most studies found that better SSRI response is associated with increased baseline pgACC reactivity, and better CBT response is associated with decreased baseline sgACC reactivity. When applying meta-analytic findings to the verification sample, the meta pgACC was associated with better SSRI response, yet in the opposite direction (less reactivity suggesting better response) than what would be anticipated from prior literature. Neural reactivity of the meta sgACC region did not predict CBT response and only partially overlapped with the sgACC region of prior studies (Siegle et al., 2006, 2012).

Meta pgACC reactivity to negative words predicted SSRI response in the verification sample, but observed effects were in the opposite direction from most studies in the meta-analysis, which might be due to methodological differences in the literature pertaining to task and intervention protocol. As tasks differed widely, it is unclear which task variable(s) are driving effect differences, but one salient difference among tasks appears to be stimuli type. Of the SSRI studies that reported ACC coordinates contributing to the pgACC cluster (Chen et al., 2007; Godlewska et al., 2018; Miller et al., 2013; Roy et al., 2010; Victor et al., 2013), all except Miller and colleagues observed greater reactivity of the ACC was prognostic of better treatment outcome, and all except Miller et al. used picture stimuli. In contrast, (Miller et al., 2013) and colleagues used a word-based task similar to the current study and found that less reactivity of the ACC to negative words was prognostic of better SSRI response. Supplementary analyses provide preliminary support for this hypothesis, as they showed greater reactivity of ACC subregions to emotional faces was associated with better SSRI response in the same SSRI sample (Supplement 7). In addition to stimuli differences, the verification SSRI sample met with providers more frequently than the other SSRI studies referenced. In our sample, the SSRI and CBT participants met with treatment providers at the same visit frequency, which is more frequent than most studies included in the meta-analysis for SSRI response, which might have contributed to similarity of effect direction between CBT and SSRI samples. For example, bi-weekly visits with a treatment provider for the SSRI group could have activated common therapeutic processes shared between treatment groups, such as maintaining a schedule to regularly visit the clinic or having a professional ask questions about symptoms and show interest by listening and recording responses. Further work examining neuroprediction associations with stimuli modality and treatment session frequency would help clarify these relationships.

In contrast to our hypotheses, neural reactivity of the meta sgACC did not significantly predict treatment response to CBT in our verification sample. The null effect for the verification sample may be due, in part, to task heterogeneity and small sample sizes in the literature. The meta sgACC cluster was derived from studies using diverse paradigms, e.g., a monetary incentive task (Straub et al., 2015) and a self-rating affective word task (Siegle et al., 2006, 2012). Greater diversity of tasks may have contributed to more signal noise, resulting in an overlapping-activation region with a less robust prognostic signal. In addition, small study samples (e.g. 14 (Siegle et al., 2006) and 22 (Straub et al., 2015)) may have resulted in overestimated effects, a phenomenon observed elsewhere in neuroimaging literature (Button et al., 2013; Cremers et al., 2017; Poldrack et al., 2017). The null effect in the CBT verification sample suggests that though fMRI indices of ACC neural reactivity are robust for research, and thus promising theoretical candidates for use in developing personalized treatment algorithms, the specific derived meta-analytic region is not a sufficiently robust predictor for immediate adoption in clinical treatment selection in individual patients. We began with the premise that clinical adoption should require regions which are both (1) robust across a wide literature, and also (2) predictive in novel patient samples. That said, our verification sample (Siegle et al., 2012) was, itself, from a successful replication, using an sgACC region derived in our previous study (Siegle et al., 2006) which had used the same task/design. It was explicitly subjected to multiple replication tests. Thus, failure to allow prediction in the meta-analytic context could suggest hoping to derive a region with reliability across tasks, designs, and populations which generalizes across tasks to new individual participants may be too high a bar.

Study meta-analytic findings revealed additional nonACC prognostic regions including the right caudate and amygdala, for SSRI and CBT, respectively. The caudate is associated with reward and learning (Delgado et al., 2004, 2005) and the amygdala with salience processes. Both have been implicated in MDD (Pizzagalli et al., 2009; Smoski et al., 2009), thus, their predictive potential seems intuitive. Again, these data are suggestive; future work understanding their robustness in verification samples would be useful before clinical adoption.

In addition to treatment selection, an additional avenue for future research involves the potential to target baseline neural functioning to optimize treatment response. For example, Hamilton et al. (2010) found that healthy individuals could successfully downregulate neural reactivity of an individually tailored sgACC region that responded maximally to negative pictures using neurofeedback, and a strategy of increasing positive mood. Thus, the meta-analytic finding could be used to support mechanistic intervention studies, e.g., pre-treatment sgACC neurofeedback, even if it is not appropriate to use, by itself, for prediction. Indeed, there is growing empirical support for the use of neurofeedback for depression (Young et al., 2017), and for patients who are not predicted to respond to a specific intervention, neurofeedback may provide opportunities in the future to adjust neural functioning prior to treatment.

Results presented here should be interpreted in light of study limitations. To obtain articles, we reviewed recent reviews (systematic and other) and meta-analyses, instead of conducting an additional systematic review of the literature. Systematic reviews are optimal for greater comprehensiveness and minimized risk of bias (Aromataris & Pearson, 2014), thus in the current study, there is the potential for missed articles or biased article selection. As an assessment of this concern, we conducted supplemental literature searches. Literature searches resulted in no additional articles, providing some support for review comprehensiveness and minimal bias of article selection. The meta-analysis also consisted of a small number of studies with small sample sizes and heterogeneous emotion-related tasks. In addition, for the meta-analysis of CBT studies, there were two studies included from one co-author’s lab, which utilized a subset of participants in the verification sample, making it a non-independent replication, though this did not confer expected advantages for prediction. The failure to find the predicted effect in the verification sample despite these advantages suggests that neural reactivity indices of these regions are not yet ready for use in individual treatment prediction. The verification sample also had non-random treatment assignment; it was a convenience sample of some participants coming from a preference trial and others from a CBT-only trial. As the current SSRI and CBT samples were assigned by participant-preference, outcomes could theoretically differ on variables unrelated to treatment differences but related to tested prediction effects. That said, our samples did not differ on pre- or post-treatment depression symptoms, depressive episode history, or demographic variables (with the exception of years of education, ~ 14 vs. 15) (Supplement 5). Our verification sample also only included CBT and SSRI, thus if a participant was predicted to fail to respond to these interventions, study results do not indicate whether the participant would respond to other interventions, such as another class of antidepressant medication, electroconvulsive therapy, or esketamine. Another study limitation pertains to measuring treatment response with self-report data (BDI). Self-report measures are limited to the patient’s awareness / insight into their symptoms and may also reflect biases in the patient’s report of symptoms (Rush et al., 2006). The BDI in particular appears to emphasize cognitive distress symptoms of depression (Brown et al., 1995) and may be biased in favor of detecting CBT-specific treatment effects (Hagen, 2007); however, despite these potential measure biases, we still failed to find prediction effects in the CBT sample, providing further evidence against readiness of fMRI neural reactivity indices for precision medicine.

Conclusion

Limitations notwithstanding, the current data suggest that neuroimaging provides potential neural markers (sgACC and pgACC) of treatment response to two of the most common and well-validated interventions for MDD, CBT and SSRIs; however, study findings suggest that neuroimaging estimates of neural reactivity are not yet robust for use in clinical treatment selection in individual patients. Rather, neuroimaging may better serve to guide future research on the subject of neurally-informed interventions. Attention to methodological considerations (e.g., stimulus type or treatment session frequency) may be an area of particular importance, necessary for future clinical translation.