Introduction

Across major US epidemiologic studies, Blacks are reliably found to have a lower prevalence of psychiatric disorders than Whites. In this paper, we use “Black” and “White” as the most inclusive and generic terms for the racial categories that are most commonly ascertained through self-identification in US population-based surveys (e.g., “Black”, “African American”, “Caribbean”, “White”, and “Caucasian”). The epidemiologic findings are unexpected from the vantage point of the dominant framework for interpreting relationships between social position and mental health, the social stress paradigm [1,2,3,4,5]. The paradigm predicts that disadvantaged groups will have worse health (mental and physical) than more advantaged groups by virtue of greater stressor exposure and access to fewer coping resources. Blacks’ uniquely marginalized political, economic, and social status in the US [6,7,8,9] makes Black–White comparisons a particularly strong test [10]  of the social stress paradigm. The failure of findings from Black–White comparisons in psychiatric disorders to conform with the social stress model’s empirical predictions is especially remarkable in light of the consistency with which Blacks’ disadvantaged social status is associated with worse physical health outcomes relative to Whites. Therefore, chronic exposure to social stressors by Blacks is both substantiated and shown to be injurious in epidemiologic studies, but the deleterious effects are consistently evident with respect to somatic disorders only, and not psychiatric disorders as well, despite the degree of comorbidity between the two [11, 12]. The consistent finding of an equal and/or lower prevalence of psychiatric disorders in Blacks compared to Whites, despite worse social status and physical health, constitutes a “paradox” that is widely acknowledged in the literature [7, 13,14,15]. Researchers have tested substantive [16,17,18] and artifactual [19, 20] explanations for this paradox, but to date, none has succeeded.

At the same time, the literature shows a pattern of equal or higher levels of non-specific psychological distress (which has also been called “demoralization” [21,22,23]) in Blacks compared to Whites in large epidemiologic studies in the US [23,24,25,26]. Though this set of findings is consistent with predictions of the social stress paradigm, the discordance between disorder and distress findings in Black–White US comparisons is unexpected, since psychiatric disorder and psychological distress ought to be positively correlated. On one hand, a distinction is often made that disorder represents dysfunction in the individual whereas distress does not assume such internal dysfunction but instead often points to stressors in the individual’s life—medical illness, crime, and poverty, for example—to which the expectable response is psychological distress  [27, 28]. This is to say, physiological dysfunction is a factor thought to give rise to, and to define, psychiatric disorder, and living in stressful circumstances is a factor thought to give rise to distress. Accordingly, psychiatric disorder and distress do not have identical etiologies. On the other hand, their etiologies may often overlap. Psychiatric disorders are generally thought to have complex etiologies involving an interaction between genetic profiles and stressful situational factors, including medical illness and chronic poverty [29,30,31,32]. Therefore, challenging circumstances in an individual’s life are thought to play a role in both disorder and distress. In addition, having a psychiatric disorder is generally distressing, and being chronically distressed may lead to psychiatric disorder in those with biologic vulnerability to disorder—suggesting that disorder and distress can become phenomenologically intertwined [21, 27, 33, 34]. And finally, there is substantial overlap between the symptoms used to diagnose mood and anxiety disorders and those used to measure distress [33, 35]. For all of these reasons, it is reasonable to expect a positive association between psychiatric disorder and distress.

In fact, empirical evidence documents strong positive associations between depression and distress in the overall population [36,37,38]. In a large nationally representative Australian sample [36], higher scores on the K10, a frequently used measure of psychological distress, were strongly associated with higher probabilities of a current affective disorder diagnosis. Two US studies [37, 38] testing associations between the CES-D and PHQ-9, also frequently used measures of distress, and diagnoses of current major depression demonstrated strong criterion validity of both measures. Given symptom and experiential overlap between distress and depression, these associations are not surprising. By extension then, groups with a higher prevalence of disorder should also have a higher level of distress. The apparent lack of such concordance in Black–White comparisons in the US thus constitutes an additional paradox.

Researchers have noted this counter-intuitive pattern of findings in distress and depression among Blacks compared with Whites across major US epidemiologic studies, but more in passing than as a deliberate focus of investigation [7, 13, 14, 18, 39,40,41,42]. An exception to this glancing attention is a study by Vega and Rumbaut [43] positing that major depression diagnoses might be artifactually suppressed among Blacks in the diagnostic interviews due both to the diagnostic logic and to differential symptom recall between Blacks and Whites. However, they do not identify a specific problem with the diagnostic logic and the evidence for differential symptom recall between Blacks and Whites is mixed [44,45,46]. To date, then, this second paradox of inconsistent findings when comparing Blacks and Whites in the US on depression and distress has not been resolved.

More systematic documentation and exploration of the second paradox may shed light on the first paradox, and ultimately contribute to its resolution. To address this gap, here, we report findings from a systematic review of the literature estimating the prevalence of major depression and levels of psychological distress in Blacks and Whites in the US. Among disorders, we focus on major depression for three reasons: (1) Blacks’ lower prevalence than Whites across multiple psychiatric disorders diagnosed in the large, nationally representative epidemiology studies is particularly marked for major depression; (2) major depression is especially vulnerable to stressor exposure [47,48,49,50], and hence, the Black–White depression finding is a particularly strong challenge to the dominant interpretive model; and (3) distress measures typically borrow heavily from the diagnostic criteria for major depression and, therefore, the discordance between depression and distress findings in Black–White comparisons is particularly surprising.

Our review draws only on studies using nationally representative samples of adults in the US, the primary population in which the paradoxes have been noted. Although the nature and degree of America’s racialized climate may vary geographically, Blacks’ exposure to racial bias remains ubiquitous, therefore, obviating the need to study more geographically specific settings. Our review also does not consider subgroups defined by immigrant status, ancestry, or any other variable. In a highly racialized setting such as the United States, physiognomy often trumps these important subgroup differences in shaping life experiences and exposures, and therefore, crude comparisons based on self-identified racial group membership remain telling. Finally, results are excluded that adjust for socio-economic variables such as income, wealth, education, employment, and marital status, because these are core explanatory levers, along with inter-personal discrimination, of the social stress paradigm. To include results adjusting for these mediators is to remove key factors that link social location to mental health. Accordingly, only results that adjust at most for sex and age are included.

Methods

We conducted our review in PubMed and PsycINFO databases through January, 2016, using a search term algorithm to identify articles reporting in English on Black–White differences in depression or distress in representative samples of the US population. To capture our populations of interest, we employed both specific race/ethnicity search terms (e.g., “Black” and “White”) as well as more general terms (e.g., “ethnicity” and “nationally representative”) in all fields in articles. To obtain articles on our relevant outcomes, we used both general terms (e.g., “mental disorders”, “depression”, and “distress”) and specific terms [e.g., “major depressive disorder”, “Diagnostic and Statistical Manual of Mental Disorders (Mesh)”, and “depressive symptomatology”]. We conducted full-text searches in both databases, rather than limiting our searches to abstracts and key terms. The first author culled articles by title, abstract, and full article review, applying the following inclusion criteria: nationally representative US adult samples in which data are reported comparing Blacks and Whites on either major depression or psychological distress, and that adjust at most for age and sex. Using these same inclusion criteria, we also reviewed the selected articles’ references lists. When two or more articles reported results from the same study, articles providing prevalence estimates were selected over those reporting odds ratios. From multi-year studies yielding more than one article, we chose the article reporting estimates over the longest range of years. Finally, for multi-year studies, some articles report an aggregate estimate covering multiple years of the study, whereas other articles report multiple single-year estimates. Because the reported estimates are our focal data points, we report them here as they are reported in the literature.

Results are categorized by whether they estimated the occurrence of major depression or distress. “Major depression” is used here to encompass both major depressive episode and major depressive disorder according to Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria. Both are included in the results, because most individuals meeting criteria for a major depressive episode receive a major depression diagnosis rather than a psychotic or manic-related disorder [28]. We only included studies reporting results based on structured diagnostic interviews employing the full DSM diagnostic criteria for major depression. These criteria include endorsing at least five symptoms co-occurring over at least a 2-week period, one of which must be either sad mood or anhedonia. Moreover, these co-occurring symptoms must not be consistent with numerous excluding criteria such as occurring within 2 months of the loss of a loved one and not occurring as the direct physiological effects of a medical condition, medication, or illicit substances or alcohol. Diagnostic interviews for major depression used in epidemiologic studies generally ascertain lifetime, past year, and past 30-day depression, though any given article will typically report estimates from only one or two of these timeframes. We subdivide the results by these diagnostic reference periods in the relevant table and figure below.

Distress results were included if the authors provided clear descriptions of measures of aversive psychological states that are not specific to psychosis, specific phobias, or substance abuse. For the most part, these measures share content with the diagnostic symptom criteria for major depressive disorder and general anxiety disorder [36]. Distress measures typically ask participants about the past year, the past 30 days, the past 2 weeks, or the past 7 days. We present the results according to these timeframes in the tables and figures below. Although the Patient Health Questionnaire-9 and -8 (PHQ-9 and PHQ-8) are often presented in the literature as measures of major depression [51], here, we categorize them as distress measures, because they fall short of fully implementing DSM major depression criteria. For example, in the PHQ, symptoms are counted towards the five-symptom minimum required in the DSM major depressive episode algorithm if they were endorsed as occurring during at least half the days during a minimum 2-week period, whereas the DSM stipulates that they occur nearly every day. Furthermore, the PHQs require that the symptom occurs at all during the day, whereas the DSM requires that, where relevant, the symptom occurs most of the day. As Horwitz and Wakefield [28] contend, non-disordered distress can often satisfy the DSM criteria for major depression, leading to misclassification as the latter; relaxing the DSM criteria—as the PHQ does—creates additional opportunities for false positive diagnoses.

We subdivide distress results between those comparing Blacks and Whites on the proportions in each group with high distress scores (with variable thresholds used across the studies) and those comparing the two groups on mean distress scores. We calculated prevalence ratios and means ratios, respectively, from the results. We used http://www.openepi.com to calculate 95% confidence intervals around the prevalence ratios when papers provided Black and White sample sizes, and to calculate 95% confidence intervals around group means when papers provided standard deviations or standard errors but not the confidence intervals themselves. We conducted t tests of differences in mean distress levels in http://www.openepi.com when samples sizes and standard deviations or standard errors were provided.

Results

The literature review (schematically summarized in Fig. 1) yielded 34 articles reporting 54 relevant outcomes. Seven articles [52,53,54,55,56,57,58]  report 9 comparative Black–White findings from 5 unique datasets on the prevalence of major depression. The remaining 27 articles [18, 24,25,26, 59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81]  report 45 distress comparisons between Blacks and Whites from 33 samples (some overlapping), using 15 different distress measures.

Fig. 1
figure 1

Literature search flow-chart

In studies documenting major depression, the Black and White samples ranged in size, respectively, from 666 to 8245 and from 4180 to 31,938. These same figures for the studies reporting distress outcomes were 198 to 41,056 for Blacks and 1102 to 396,273 for Whites.

Major depression

Figure 2 and Table 1 summarize the nine major depression comparisons. Blacks have a lower prevalence than Whites in eight comparisons; six of these differences are statistically significant as indicated by the 95% confidence intervals for the prevalence ratios. In the one instance in which Blacks have a higher prevalence of major depression than Whites, the difference is slight (2.5 versus 2.2%), and not statistically significant. Regarding diagnostic reference periods, Blacks have a statistically significantly lower prevalence than Whites in all four lifetime comparisons, and in 2 of 3 past year comparisons. Of the two past-30-day comparisons, Blacks have a lower prevalence than Whites in one study, but the difference does not achieve statistical significance.

Fig. 2
figure 2

Black–White major depressive disorder prevalence ratios with 95% confidence intervals (where reported). ECA Epidemiologic Catchment Area study, NCS National Comorbidity Survey, NCS-R National Comorbidity Survey Replication, NESARC National Epidemiologic Survey on Alcohol and Related Conditions, NLAES National Alcohol Epidemiology Survey

Table 1 Summary of findings comparing Blacks and Whites on prevalence of major depression

Distress

Figure 3 and Table 2 summarize Black–White prevalence ratios for “high distress” based on the study-specific cut points as shown (Fig. 3, top; Table 2, panel a) and mean distress level ratios (Fig. 3, bottom; Table 2, panel b). A higher proportion of Blacks than Whites score over the cut points in 31 of 34 comparisons. Of the 31 comparisons in which Blacks are more likely than Whites to score high on distress, 15 are statistically significant differences [based on 95% confidence intervals or significance test results (p < 0.05)], 3 are not statistically significant, and 13 are indeterminate, because not enough data are provided for significance testing. In these latter 13 instances, the Black–White prevalence ratios range from 1.14 to 2.28. In the three studies in which Whites score higher on distress, two differences are statistically significant and the third is indeterminate. Notably, these two statistically significant results showing higher distress in Whites used a version of the K6 psychological distress measure that references the last 12 months. Only one other distress estimate in our results references the past 12 months; all other estimates reference the last month, last 2 weeks, or last week. Furthermore, the 12-month version of the K6 showing higher prevalence of distress among Whites asks participants to recall the month in the past 12 months when they were at their worst “emotionally”. The K6 versions referencing recent weeks ask instead how respondents are “feeling”, without the emotions modifier.

Fig. 3
figure 3

Black–White distress prevalence ratios with 95% confidence intervals (where estimable). CES-D Center for Epidemiologic Studies—depression, PHQ Patient Health Questionnaire, K6 Kessler 6

Table 2 Summary of findings comparing Blacks and Whites on distress

Blacks have higher mean distress scores than Whites in all 11 comparisons. One of these differences is statistically significant, three are not, and seven are indeterminate, because neither the standard deviations nor standard errors of the mean estimates are provided.

Across all distress findings (high distress and mean distress), Blacks have higher levels than Whites in 42 of 45 comparisons; of these 42, 16 are statistically significant, 6 are not, and 20 are indeterminate.

In sum, Blacks have a lower prevalence of major depression than Whites in eight of nine comparisons based on diverse data sets and across different reference periods; six of these differences are statistically significant. In none of the nine major depression comparisons do Blacks have a statistically significant higher prevalence than Whites. In contrast, Blacks show higher distress levels than Whites in 42 of 45 comparisons examined. Of the 25 comparisons in which the statistical significance of the differences can be tested, Blacks are statistically significantly higher than Whites in 16, Whites are statistically significantly higher in two, and there is no statistically significant difference in the remaining seven.

Discussion

Blacks have a lower prevalence of major depression than Whites in eight of nine comparisons but higher distress levels in 42 of 45 comparisons. Our results from a systematic review of the literature, therefore, align with observations based on cursory reviews [14, 18, 39, 40]. In short, psychiatric epidemiology research consistently documents that Blacks have less major depressive disorder but higher distress levels than Whites in the US.

The attenuation of the Black–White depression paradox in more recent reference periods relative to lifetime prevalence that we see in our results could be attributed to the reliable finding in the US that, on average, cases of major depression persist longer in Blacks compared with Whites [82, 83]. Greater persistence may in turn be attributed to worse access to high-quality screening, diagnosis, and treatment in Blacks compared with Whites in the US [58, 84, 85]. Disorder prevalence is influenced both by incidence and duration of the disorder; therefore, groups with worse access to treatment will have more persistent cases that are more likely to be captured than relatively quickly treated and resolved cases when prevalence is estimated using recent or current reference periods rather than lifetime. We note that this explanation further instantiates the Black–White depression paradox by interpreting the attenuation of the paradox at more recent time frames as an artifact of differential treatment likelihood.

As noted at the outset, the patterns of a lower prevalence of depression in Blacks compared with Whites, coupled with higher distress, signify a double paradox. The first paradox, from the perspective of the social stress paradigm, is that despite having a disadvantaged social status in the US, Blacks have a lower prevalence of major depression than Whites. The second paradox is that Black–White comparisons of distress are discordant with Black–White comparisons of major depression despite evidence in the broader literature of a strong positive association between major depression and distress.

Both artifactual and substantive explanations have been proposed to resolve the first paradox. Artifactual explanations presume that the findings are invalid due to methodological error. For example, one artifactual explanation for the Black–White depression paradox posits that selection bias in the household sample-based studies that document the paradox disproportionately undercounts depression in Blacks [4, 86,87,88]. Specifically, the explanation contends that Blacks are disproportionately represented in the groups excluded from household samples (e.g., the incarcerated, homeless, and those living on military bases), which also have a relatively high prevalence of disorder. For this methodological explanation of the Black–White depression paradox to be persuasive, we would expect to see, in household sample-based studies, stronger evidence of the paradox in demographic subgroups where these selection factors are more operant (e.g., young males with lower educational achievement), and weaker evidence where they are less operant (e.g., older females with higher educational achievement). That is, in a study drawing on a household sample in which those who are incarcerated, homeless, or living on a military base are generally excluded—all three of which are groups in which Blacks are disproportionately represented—we would expect to see stronger evidence of the Black–White depression paradox in the subgroups where these exclusion factors are most operant. Thus, evidence of a strong Black–White depression paradox in a subgroup of young males with low educational achievement and an attenuated paradox, or no paradox at all, in subgroups of older adults with Bachelor’s degrees, would be consistent with this selection bias explanation. However, a recent large household-based study found uniform evidence of the paradox across 24 subgroups cross-tabulated by age, sex, and education [89], thus providing evidence inconsistent with this selection bias explanation. We note that a definitive test of this selection bias hypothesis would entail extensive data on Black–White differences in depression among institutionalized populations, as well as valid estimates of the sizes of these populations.

Another artifactual explanation for the first paradox suggests that the diagnostic interview for depression used in epidemiologic studies captures depression more effectively in Whites than Blacks [19, 20, 39, 90]. For example, Breslau et al. [19] and Uebelacker et al.  [20] test whether differential item functioning between Blacks and Whites in the diagnostic interview for depression explains any of the paradox. Though both detect small levels of differential item functioning between Blacks and Whites on several symptoms, it is insufficient in both cases to explain a meaningful portion of the depression paradox.

Substantive explanations, on the other hand, presume that the lower prevalence of major depression in Blacks relative to Whites is valid and they posit protective factors thought to be more prevalent in Blacks than Whites, such as religiosity, ethnic identity, high self-esteem, and strong social support [16, 18, 83, 91]  to account for the pattern. To date, empirical tests of substantive explanations have not supported these hypotheses. Examples include examining whether better social support in Blacks than Whites explains Blacks’ lower prevalence of major depression [16, 17]. Despite operationalizing social networks in numerous ways, these researchers found no support for this explanation. Results from similar tests of self-esteem, ethnic identity, and religiosity as explanatory factors have not been published, to our knowledge. In addition, the social stress paradigm predicts worse mental health outcomes in disadvantaged groups in part by virtue of poorer coping resources. To explain the paradox of a lower prevalence of depression in Blacks than Whites by virtue of better coping resources—as these substantive explanations discussed do—begs the question of why disadvantaged groups would have better coping resources than more advantaged groups, when a hallmark of advantaged status is greater purchase, literal, and otherwise, on coping resources. To date, there is no theoretical basis for postulating the opposite, or that the distribution of coping resources is independent of socio-economic status.

A more recent substantive hypothesis [18] proposes an interaction between race, stress, and poor health behaviors (e.g., alcohol consumption), such that at higher stressor levels, unhealthy behaviors are more protective against depression in Blacks than in Whites, while simultaneously leading to worse somatic health in Blacks. Tests of this hypothesis have yielded mixed results [18, 92, 93]. Moreover, this hypothesis lacks plausible explanations for why coping behaviors are more protective for Blacks than for Whites against depression and, with respect to the second paradox, why this protection does not extend to psychological distress [41].

Further development of substantive theory regarding the interplay between social status, health behaviors, and health outcomes may yet yield a persuasive interpretive framework in which the Black–White depression paradox is not so paradoxical after all. Other instances in which a socially advantaged group (e.g., men, the highly educated, etc.) does not enjoy uniformly superior health status have been documented and explained as the pursuit of other goals (e.g., hegemonic masculinity and delayed childbearing) that are achieved at the expense of health outcomes [94, 95]. However, the application of this hypothesis to explain lower than expected levels of depression in Blacks relative to Whites may be limited in view of three countervailing findings. First, this explanation could apply only to Black–White patterns with respect to depression, but not distress; as we document here, Blacks tend to have higher levels of psychological distress than Whites in the US. If the opportunity cost of pursuing other goals is worse mental health symptomatology among Whites, this should be apparent in both depression AND psychological distress. Second, evidence suggests that Whites are more likely than Blacks to obtain services for mental health problems [84, 85], suggesting that they are more, not less, likely to attend to psychiatric problems. Third, the Black–White depression paradox appears to be robust across levels of education [89]; if the pursuit of non-health goals by those with greater social advantage could explain suboptimal mental health, one might expect the depression discrepancy between Blacks and Whites to attenuate within levels of education, but it does not appear to do so.

Results of our systematic review demonstrate that proposed explanations for the lower prevalence of major depression in Blacks compared with Whites must contend with Blacks’ higher distress levels to succeed. To our knowledge, no explanations for this second paradox of discordant results in Black–White depression and distress comparisons have been advanced. We propose that the consistency of the second paradox that we document in our review tallies better with artifactual than substantive explanations for the first paradox. As we have noted above, substantive explanations for the first paradox must explain why ostensibly protective factors (e.g., religiosity) more prevalent in Blacks than Whites in the US would protect against major depression but not distress. This is a particular challenge given the overlapping symptom content in diagnostic interviews for depression and distress scales.

Promising artifactual explanations for the first paradox that also confront the second paradox might consider the extent to which the exclusions specified in the diagnostic algorithm for major depression lead to differential misclassification between Blacks and Whites. For example, the depression diagnosis requires endorsing either sad mood or anhedonia, both of which are considered psychological symptoms. However, the nine diagnostic symptoms of major depression are generally thought to capture both psychological and somatic domains [33, 96,97,98,99,100]. Because the screening symptoms are both psychological, however, the algorithm advantages psychological expressions of depression. If Blacks in the US express depression more somatically than Whites, as some have suggested [101,102,103,104,105], the diagnostic algorithm would disproportionately underestimate depression in Blacks relative to Whites. We note that in our discussion of differential item functioning above, the two studies that we discuss [19, 20] used samples in which all participants had screened into the full diagnostic interview for major depression, because they had endorsed at least one of the two psychological screening symptoms. Accordingly, these samples are ostensibly biased against those who express depression more somatically. To the extent Blacks express depression more somatically than Whites do, these studies would not have been able to discern that difference.

A disproportionate undercount of depression in Blacks compared with Whites due to screening symptoms that privilege psychological expressions of depression would not occur in distress measures, which simply sum across endorsed symptoms. However, the privileging of psychological over somatic symptoms in diagnostic criteria may also explain our divergent findings for the K6 distress measure that references the last 12 months. As we noted above, the 12-month version of the K6 anchors symptoms to the period when respondents were at their worst “emotionally”, whereas other versions of the K6 ask how participants are “feeling”. If Blacks are less likely than Whites to endorse symptoms in an emotionally anchored distress measure, the group comparison is more likely to mimic that for major depression, as we observed. The diagnostic algorithm for depression also stipulates physical illness and bereavement exclusions for, respectively, depressive episodes thought to be physiologic in origin (e.g., hypothyroidism) and those soon following the loss of a close other. To the extent that these exclusion criteria are over-applied in epidemiologic studies, Blacks’ worse morbidity and mortality rates compared with Whites in the US [106] could also lead to a disproportionate undercount of depression in Blacks. In short, the diagnostic algorithm for major depression provides multiple opportunities for differential misclassification bias that could conceivably account in whole or part for the Black–White depression paradox. It is also possible that differential symptom recall between Blacks and Whites leads to a disproportionate undercount of prior depression episodes in Blacks. This explanation is consistent with our finding that two of the three instances in which Blacks had lower distress than Whites occurred when the reference period was the past 12 months, versus more recent reference periods. To date, this hypothesis to explain racial patterns in major depression has not been rigorously tested [44,45,46] and further examination is warranted. Ultimately, we suggest that good or good enough theory is sufficiently rare that all plausible artifactual explanations for contravening evidence ought to be tested before a theory is abandoned.