Introduction

Depression is a serious, frequently recurrent illness as a significant proportion of patients experience relapse following an initial improvement or even remission (Georgotas et al. 1989, Sim et al. 2015). To reduce relapse, current treatment guidelines suggest the continuous use of effective antidepressant drugs that were successfully used in the acute phase for at least 4–9 months after remission and often longer for patients with repetitive depressive episodes (American Psychiatric Association 2010; Bauer et al. 2007). Nevertheless, approximately half the patients with depression experience relapse in a longitudinal course of the illness despite such maintenance treatment (Forte et al. 2015).

Being able to identify people who will have a poor prognosis from an ongoing treatment at the earliest possible occasion could allow for an earlier implementation of a next treatment strategy (Nakajima et al. 2010). Prediction of relapse in depression has been a hot area of research. In fact, a PubMed search with keywords of depression, predictor, and relapse finds as many as 332 articles (November 2016). Among the predictors, a summed score or a number of residual symptoms is one of the most frequently reported and replicated predictors (Faravelli et al. 1986; Judd et al. 1998, 2000; Lin et al. 1998; Paykel et al. 1995; Pintor et al. 2003, 2004; Teasdale et al. 2001; Simons et al. 1986; Van Londen et al. 1998; Nierenberg et al. 2010; Taylor et al. 2010). However, those studies have not thoroughly examined the contribution of each individual symptom on the subsequent relapse, but instead solely relied on a total score or a count on residual symptoms in the representative rating scales (Holsboer 2001). Since depression is a heterogeneous syndrome, individuals with similar total scores in the rating scales can have a wide variety of symptoms (Fried and Neese 2015a). Early improvements in certain depressive symptoms may serve as a predictor of subsequent remission in the acute phase of the treatment (Sakurai et al. 2013; Funaki et al. 2016); it would alternatively be of high clinical relevance to focus on individual symptoms to predict subsequent relapse.

Moreover, self-rated and clinician-rated illness severities do not necessarily go hand in hand in depression (Dunlop et al. 2011). Additionally, it was reported that patients who evaluated their symptomatology as more severe than clinician rating were less likely to achieve remission, suggesting that such a differential between self-report and clinician rating could serve to predict antidepressant treatment response (Tada et al. 2014). Focusing on this issue may contribute to predicting longer term outcomes more precisely (Sakurai et al. 2013; Tada et al. 2014).

Although it is well known that residual symptoms in major depressive disorder (MDD) are detrimental to prognosis, only a limited number of studies thus far have systematically examined which residual symptoms were related with subsequent relapse in the long term (Taylor et al. 2010; Dombrovski et al. 2007, 2008). In brief, these trials found that certain residual symptoms after the acute treatment, including loss of appetite, insomnia, psychological anxiety, and hypochondriasis, were significantly associated with subsequent relapse. However, these findings are based on either self-report or clinician interview alone, and their sample sizes were as small as 84 to 131. Therefore, the aim of the present study was to identify which individual residual symptoms, as assessed with both self-report and clinician rating, could predict subsequent relapse in patients with MDD, using a generalizable, large-scale sample in the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial.

Methods

Study design

The STAR*D trial was funded by the National Institute of Mental Health to compare the effectiveness of several medications or cognitive therapy for individuals with nonpsychotic MDD; the study has been detailed elsewhere (Fava 2003; Rush et al. 2004). Briefly, the STAR*D trial enrolled 4041 outpatients aged 18 to 75 years from 18 primary and 23 psychiatric practice settings across the USA (Rush et al. 2006a). Participants received citalopram as their first treatment step for 12 weeks (or 14 weeks if needed) unless treatment was discontinued for any reason (level 1). Those who achieved remission or experienced a meaningful improvement could enter a 12-month naturalistic follow-up phase; the data used in the present study were derived from the follow-up phase after level 1. Following a complete description of the study, participants provided written informed consent at the study enrollment in the original studies. Because of the completely anonymous nature of this analysis and an absence of direct human involvement, no ethical approval was sought for the present analysis.

Study population

Inclusion criteria were a primary diagnosis of nonpsychotic MDD based on the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) and a score of ≥14 on the 17-item Hamilton Rating Scale for Depression (HAMD17) at level 1 entry. Those with a meaningful improvement preferably remission (i.e., a score of ≤5 on the 16-item Quick Inventory of Depressive Symptomatology clinician rating (QIDS-C16)) per clinician judgment in level 1 could enter the follow-up phase. Patients were excluded if they were diagnosed with schizophrenia, schizoaffective disorder, bipolar disorder, anorexia nervosa, bulimia nervosa, or obsessive-compulsive disorder.

Treatment

All participants received treatment with citalopram for 12 weeks (or 14 weeks if needed) in level 1. Citalopram was administered at 20 mg/day and could be increased to 60 mg/day, using a measurement-based care approach (Trivedi et al. 2006). In the follow-up phase after level 1, the protocol strongly recommended that the participants continue the citalopram treatment at the doses used in level 1, but that any psychotherapy, medication, or medication dose change could be applied at the discretion of the treating physicians. Medication management was based on clinician judgment, which was recommended to occur every 2 months, typically without clinical research support.

Assessment measures

The clinician-rated 30-item Inventory of Depressive Symptomatology (IDS-C30), that is inclusive of all items in the QIDS-C16, was administered by the research outcome assessors at the follow-up entry. The 16-item Quick Inventory of Depressive Symptomatology self-report (QIDS-SR16) was completed by participants at the follow-up entry and at monthly intervals during the follow-up period, using a telephone-based interactive voice response system. Since the QIDS-SR16 and QIDS-C16 item scores range from 0 to 3, a threshold score of 1 identifies the symptoms of some clinical concern after meaningful improvement while a threshold of 2 identifies those that would qualify the threshold for DSM-IV criteria. Relapse was defined as a QIDS-SR16 score of ≥11 at any time in the follow-up phase (Rush et al. 2006a). As the participants were evaluated only with the QIDS-SR16 by the interactive voice response system, relapse could not be captured in terms of the duration of the episode.

Statistical analysis

Baseline sociodemographic and clinical characteristics were compared between participants who relapsed and those who did not. Scores in individual symptoms in the QIDS-SR16 at the follow-up entry after the level 1 acute treatment, and those in the corresponding symptoms in the IDS-C30 (i.e., the same as those in the QIDS-C16), were extracted. As a consequence of extracting higher scores in composite questions regarding appetite and body weight, 14 individual symptoms were available from 16 items. Proportions of the presence (i.e., a score of ≥1) of each individual symptom in the QIDS-SR16 and QIDS-C16 at the follow-up entry were compared, using the McNemar’s test. Spearman correlation coefficients were calculated to examine the relationship between the total as well as the individual residual scores in the QIDS-SR16 and QIDS-C16.

To evaluate association between the value of each individual residual symptom in the QIDS-SR16 and QIDS-C16 at the follow-up entry and subsequent relapse, a Cox proportional hazards model adjusted for age, gender, length of the current episode, and number of the past episodes was employed for each scale. Kaplan-Meier survival curve estimated the cumulative proportion of relapse by the most predictable symptom in the QIDS-SR16. Log-rank test was used to test for the difference in the proportion among the groups.

To compare prediction powers between individual symptom scores and a total score, hazard ratios (HRs) of relapse for a summed score of ≥2 in three predictor individual symptoms derived from the above analysis (see “Results” section) and a total score of ≥6 in the QIDS-SR16 were calculated with a Cox proportional hazards model, respectively. The cutoff points were determined, since remission was elsewhere defined as a score of <6 on the QIDS-SR16 (with the highest possible score being 27) (Rush et al. 2006b), and the corresponding score for the highest summed score of three symptoms (the highest score being 9) was calculated to be an average of 2 (i.e., one third of 6).

The same analysis was repeated, using the QIDS-C16. A p value of <0.05 was considered statistically significant (two-tailed). All of the data were analyzed using the Statistical Package for Social Science (SPSS) version 22.0 for Windows (IBM Corporation, Armonk, NY).

Results

Subject characteristics

Of 1475 participants who entered the follow-up period after level 1 of the STAR*D trial, 1133 participants received at least one post-baseline contact. Of these participants, 40.1% (n = 454) relapsed and 59.9% (n = 679) did not. Table 1 summarizes sociodemographic and clinical characteristics of the study sample.

Table 1 Sociodemographic and clinical characteristics of the sample

Individual residual symptoms

The proportion of participants who presented with residual symptoms in the QIDS-SR16 and QIDS-C16 is shown in Table 2. The prevalence was significantly different between QIDS-SR16 and QIDS-C16 in some residual symptoms. Spearman correlation coefficients between total and individual residual scores in the QIDS-SR16 and QIDS-C16 are shown in Table 3. The coefficients between each individual symptom were less than 0.5 except for appetite change and weight change in the QIDS-SR16. On the other hand, the coefficients between the total score and some individual scores were more than 0.5.

Table 2 Proportion of subjects with residual symptoms at entry of the follow-up period
Table 3 Spearman correlation coefficients between total and individual scores in the QIDS-SR16 and QIDS-C16

Prediction of relapse by individual residual symptoms

All of the individual symptoms may be considered a contributing variable to the outcome in question in that p values of crude hazard ratios were confirmed to be less than 0.1 for them all (Tables 4 and 5), which is a typical threshold for the selection of the variables that are put into a statistical model. While many variables are likely to lead to unstable result in case of the limited number, a large number of patients in the STAR*D trial would be protective in this respect. In the Cox proportional hazards model (χ 2 (18) = 157.308), the following three QIDS-SR16 symptoms at the follow-up entry were significantly associated with subsequent relapse in the follow-up phase: restlessness (HR = 1.197, 95% CI = 1.031–1.390, p = 0.018), hypersomnia (HR = 1.190, 95% CI = 1.044–1.356, p = 0.009), and weight change (HR = 1.127, 95% CI = 1.005–1.264, p = 0.041) (Table 4). As an example, Kaplan-Meier survival curve by residual restlessness in the QIDS-SR16 is shown in Fig. 1. The following three symptoms in the QIDS-C16 were significantly associated with relapse in the follow-up phase (χ 2 (18) = 104.986): restlessness (HR = 1.328, 95% CI = 1.119–1.577, p = 0.001), sleep onset insomnia (HR = 1.129, 95% CI = 1.001–1.272, p = 0.047), and weight change (HR = 1.125, 95% CI = 1.003–1.263, p = 0.045) (Table 5).

Table 4 Association between residual symptoms in the QIDS-SR16 and subsequent relapse
Table 5 Association between residual symptoms in the QIDS-C16 and subsequent relapse
Fig. 1
figure 1

Time to relapse by residual restlessness in the QIDS-SR16. Statistically significant differences were found among the groups (log-rank statistic = 59.9, p < 0.001)

Comparison between individual symptom scores and a total score

The HR for a summed score of ≥2 in the restlessness, hypersomnia, and weight change items in the QIDS-SR16 was 2.021 (95% CI = 1.656–2.465, p < 0.001). The HR for a summed score of ≥2 in the restlessness, sleep onset insomnia, and weight change items in the QIDS-C16 at 1.652 (95% CI = 1.354–2.016, p < 0.001). On the other hand, the HRs for a total score of ≥6 in the QIDS-SR16 and QIDS-C16 were 2.579 (95% CI = 2.113–3.148, p < 0.001) and 2.411 (95% CI = 1.972–2.948, p < 0.001), respectively. It means a prediction performance that is numerically lower for the summed scores of the predictor symptoms compared with the total scores, but with much lower number of symptoms.

Discussion

To our knowledge, this is the first study to examine the impact of individual residual symptoms on subsequent relapse in depression. Our analysis indicated that some residual symptoms such as restlessness, hypersomnia, and weight change predicted subsequent relapse when their symptoms were self-rated. The predictors on the subsequent relapse were similar when symptoms were clinician rated; restlessness, sleep onset insomnia, and weight change. These findings overall replicated detrimental effects of residual symptoms in the longitudinal course of the illness. While they also suggested the similarity between patient- and clinician-rated assessments, a possible discrepancy might be carefully monitored.

QIDS-SR16 symptoms remaining at the entry of the follow-up that predicted subsequent relapse were residual restlessness, hypersomnia, and weight change, some of which were regarded as atypical symptoms (Novick et al. 2005). The findings from some of the past reports are consistent with our results. For instance, Dombrovski et al. reported that residual anxiety and sleep disturbance following recovery from depressive episodes was associated with an increased risk of recurrence in late-life patients (N = 116) (Dombrovski et al. 2007). Taylor et al. found in their 2-year follow-up study after the acute phase cognitive therapy (N = 84) that increased psychological anxiety was a risk factor for relapse and recurrence over a 2-year follow-up (Taylor et al. 2010). Taken together, these findings suggest that the presence of some individual residual symptoms is expected to serve as a negative outcome predictor in patients with MDD. Although a summed score may provide an estimate of overall psychopathological load, individual symptoms can also be informative in assessing global functioning (Faravelli et al. 1996) and predicting clinical outcomes (Fried and Neese 2015b). While the total scores or the number of residual symptoms endorsed in the representative rating scales have classically been the focus in an effort to predict outcomes in a long run (Taylor et al. 2010), and the prediction performance was numerically lower for the summed scores of the predictor symptoms compared with the total scores in the present study, abbreviated scales are of high clinical relevance in the real-world busy clinics to focus on specific risk symptoms in an effort to prevent relapse.

It is of interest to point out that restlessness and weight change were associated with subsequent relapse in both QIDS-SR16 and QIDS-C16. Furthermore, as shown in Table 2, the proportions of patients who experienced these two symptoms were comparable between the QIDS-SR16 and QIDS-C16, whereas several symptoms did not converge despite the fact that they were assessed at the same time. Although subjective discomfort and objective appreciation for their sufferings can sometimes be discordant (Carter et al. 2010; Dunlop et al. 2010), restlessness and weight change may be useful in predicting long-term prognosis in MDD. On the other hand, residual insomnia symptom in association with subsequent relapse in the QIDS-SR16 did not coincide with that in the QIDS-C16, which underscores a need to pay attention to both objective and subjective perspectives.

There are several limitations to be noted in the present study. First, this is a reanalysis of the STAR*D data; the original trial was not designed to evaluate the issue addressed herein, and symptomatology was only assessed monthly over a 12-month period. Therefore, relapse could not be captured other than the symptom severity. Additionally, treatments were less stringent in this phase, while the protocol strongly recommended that the participants continue citalopram at the doses used in level 1. Second, the generalizability of our findings may be limited considering the characteristics of the participants in the original STAR*D trial; they were limited to US outpatients with nonpsychotic MDD with a relatively severe and recurrent symptoms. Furthermore, all participants received citalopram at the follow-up phase after level 1 of this trial, which hampers any extrapolation of our results to other antidepressant drugs that are not homogeneous in their clinical characteristics. Third, while we employed individual symptoms as independent variables since they are independently assessed and scored, a machine learning approach, using some independent data, that was utilized in previous studies (Chekroud et al. 2016; Koutsouleris et al. 2016) would have been ideal for validation of our model, which however was not possible because of lack of appropriate comparable independent data. Fourth, it is important to be aware that symptomatic scores and functional perspectives may not necessarily go hand in hand (McKnight and Kashdan 2009). There may be a time lag between them, but functional outcomes were not the main focus of our study. Our focus in this study was a dichotomous outcome of relapse, which nevertheless is intuitive and has been frequently utilized in depression research to inform long-term management. Finally, there were many dropouts at the follow-up phase in the STAR*D trial. A total of 47.5% participants who entered this phase were prematurely withdrawn during 12 months for a reason other than relapse. However, as far as we are aware, this is the largest study in the literature to address this clinically important topic, using a well-organized sample.

In conclusion, some individual residual symptoms, including restlessness, insomnia, and weight change, may help better identify patients who are prone to subsequent relapse. Additionally, contribution of individual residual symptoms to subsequent relapse was similar between clinician rating and self-report. Although these findings need to be replicated in other populations as well as in patients receiving other antidepressants, our findings underline the importance of evaluating individual symptoms as well as the total scores or the numbers of symptoms endorsed in the representative rating scales, the latter of which has previously been identified in the literature.