In an era of vast human migration around the world, psychiatric clinicians in industrialized countries, such as the United States, are called on to diagnose and treat patients from a heterogeneous range of ethnic, racial, cultural, social and linguistic backgrounds. Pronounced challenges to diagnostic accuracy and agreement in clinical practice come with these demographic differences. The growth of the US Hispanic population—projected to rise from 35 million in 2000 to over 47 million in 2010, and up to 87 million in 2040 (US Census Bureau 2004)—represents one such challenge. Recent Hispanic immigrants are joining large established communities of Hispanics in the US, and together they face linguistic and cultural barriers in communicating with providers for their psychiatric needs. Some speak only Spanish, others are bilingual, and still others monolingual English speakers. There are also variations in their conceptions of mental illness, its assessment and treatments based on a diverse set of cultural beliefs and traditions (Cabassa et al. 2007). Communication problems will curtail access to care and prolong the under-utilization and premature termination of services (Alegria and McGuire 2003; Cabassa et al. 2006; Vega and Alegria 2001; Vega and Lopez 2001). To assure accuracy and consistency and to improve service delivery, our diagnostic strategies need examination.

Because of the makeup of our psychiatric labor force, Hispanic immigrants are very likely to be assigned non-Hispanic, English-speaking clinicians who, while wanting to provide high-quality services, may have difficulty communicating with and understanding patients’ linguistic and cultural nuances. Ethnic similarity with patients may enhance clinicians’ ability to identify cultural modes of expressing symptoms; comprehend meanings of experiences; and understand variations in thought and expression, but it does not guarantee that clinicians will agree on their diagnostic decisions (Malgady et al. 1987; Strakowski et al. 2003). Clinicians render significantly different diagnoses of patients who are ethnically similar or different from them, raising questions about diagnostic accuracy and reliability (i.e., agreement by different clinicians using a common diagnostic system; Langenbucher et al. 1996). Structured diagnostic instruments may not resolve this problem since they typically do not capture the more nuanced information that patients provide and clinicians elicit with good skills and observation of cultural features.

This pilot study was designed to explore agreement between clinicians of Hispanic and non-Hispanic background and between clinicians and a structured diagnostic measure, in interviews with Hispanic patients in an urban clinic. Our operating assumption is that in community mental health, patients’ diagnoses should not differ dramatically from one clinician to another due to their or their clinicians’ ethnicities or cultures. Furthermore, diagnoses by objective measures and clinicians should be relatively concordant. Patients should expect to get the same, accurate diagnosis regardless of the clinician they visit. Diagnosis should not be unduly influenced by race, culture or ethnicity, although it should be sensitive to their influences.

While this assumption may have some face validity, contemporary research on diagnostic practice does not seem to support it. Minsky et al. (2003), for example, found that African Americans are more likely to be diagnosed with schizophrenia, while Hispanics are more likely to be diagnosed with major depressive disorders rather than schizophrenia-spectrum disorders. Hispanics tend to self-report more psychotic symptoms than African Americans, and yet do not receive schizophrenia-spectrum diagnoses at the rates African Americans do. Strakowski et al. (1996) report that African American patients are significantly more likely than White patients to show severe psychotic symptoms and be diagnosed with schizophrenia, and less likely to be diagnosed with psychotic depression. The cultural gap between clinicians and patients may result “in assigning pathology where it [does] not belong” (Strakowski et al. 1996, p. 122) such as misinterpreting culturally based jargon as thought disorder. Diagnostic agreement between emergency room clinicians and a structured research interview is less common for non-White minority patients than White patients (Strakowski et al. 1997). In fact, overall agreement between emergency room clinicians and structured-interview clinicians occurred only in 42% of patients (kappa = 0.25). In sum, racial differences in diagnoses are not entirely eliminated by using structured clinical interviews (Neighbors et al. 2003).

Signs of mental illness may be assessed differently by racial and ethnic groups, and culture adds variations in how persons communicate symptoms (Arnold et al. 2004). Trierweiler et al. (2000) conclude that African Americans as a group may elicit different responses from clinicians than do non-African Americans. Diagnostic differences may be the result of clinicians’ attributions of certain behaviors to particular groups because they weigh their observations of behaviors differently based on patient ethnicity, race, or culture, and even gender and age (Alegria and McGuire 2003). Clinicians’ assessments of clients’ orientation, symptoms, judgment, and other mental functions will be affected by their interpretations of clients’ verbal and nonverbal communication. Sociocultural theory frames the psychiatric diagnostic process as a series of cultural interpretations made by patients and clinicians. The series begins with patients’ interpretations of their symptoms and problems, which they then report in a manner consistent with the cultural categories, words, images, and feelings they have for expressing distress. Clinicians then interpret patients’ interpretations from their own ethnic cultures and the culture of diagnosing in psychiatry (Kleinman 1996). Diagnostic differences among clinicians of ethnic and racial minority patients may be a result of attributions, ambiguous patient behaviors, and clinicians’ interpretive bias (Minsky et al. 2003; Strakowski et al. 1996; Trierweiler et al. 2000).

We generally presume that the greater the cultural and ethnic distance, the higher the potential for misunderstandings, misattributions, and under-, over-, or misdiagnosis of patients. However, past research with Hispanics shows that cultural, linguistic, and ethnic proximity does not reduce potential misunderstanding. Some studies report that Spanish-speaking patients appear more psychotic during Spanish interviews than English ones (Del Castillo 1970). Others report that Spanish-dominant persons with schizophrenia are rated as more symptomatic when interviewed in English than in Spanish (Marcos et al. 1973). Even when clinician ethnicity is held constant by using bilingual Hispanic clinicians to evaluate Hispanic patients in both English and Spanish, symptoms are rated as more severe in Spanish interviews (Price and Cuellar 1981; Malgady and Costantino 1998).

Much of the diagnostic research on Hispanics predates the advent of multi-axial, criteria-driven diagnostic systems and DSM-based structured clinical interviews (American Psychiatric Association [APA] 2000). The research to date has not included objective diagnostic measures and has incorporated limited diagnostic categories, mostly, schizophrenia and major depression (Marcos et al. 1973; Price and Cuellar 1981; Malgady and Costantino 1998). Most studies have also differed on patients’ illness severity (i.e., acute, recent admissions; chronically ill community-dwelling outpatients). Clinical studies with Hispanic service-seekers in typical community settings that include a range of diagnostic categories and symptom severity are lacking, despite the need for research reflecting real-world, everyday community practice (Hohman and Shear 2002). Everyday realities in community diagnostic practice include heterogeneity in patients’ social, cultural, ethnic, racial, economic, and clinical profiles; diagnosing without structured instruments or objective measures; and a relatively homogeneous clinician group.

Within this context, we compared levels of diagnostic agreement between Hispanic and non-Hispanic clinicians and between clinicians and a structured clinical research interview when assessing adult Hispanic outpatients. We also examined clinicians’ agreement on ratings of patients’ symptom severity and assessment of general functioning.

Method

Participants

We screened US or foreign-born Hispanic patients 18 years or older without a history of psychiatric treatment during the preceding 12 months, requesting psychiatric services for the first time in the adult outpatient clinics of a large urban general hospital. New patients were selected to avoid the influence of past psychiatric records and patients with experience in the diagnostic process. Eligible patients were required to (a) complete a brief capacity-to-consent screener (Zayas et al. 2005) to determine participation or not; (b) be videotaped during their intake interview with the assigned clinician; (c) complete a structured clinical interview administered by a research clinician; and (d) permit a second clinician to view the videotaped interview. Upon showing capacity and giving informed consent, participants were interviewed by a Hispanic or non-Hispanic clinician in the adult psychiatric services. The clinicians had volunteered and also given informed consent to conduct live interviews or to render diagnoses from videotaped interviews. All procedures in this project were approved by the human subject committees of Washington University in St. Louis and data collection sites.

Instruments

As the objective measure against which to compare clinician diagnoses, we used the Structured Clinical Interview for DSM-IV-TR Research Version (SCID; First et al. 2002). The SCID is a widely used clinical instrument with good validity and reliability for Axis I. We used English and Spanish versions. Clinicians rated symptom severity with the brief psychiatric rating scale (BPRS; Overall and Gorham 1988) which consists of 18 items rated on a seven-point scale. Each item is anchored by definitions and descriptions of expected symptoms and problems for each of the seven intensity rating options and has good reliability and validity (Lachar et al. 2001). In the analyses, we used patients’ total BPRS scores.

Procedures

We used a “quasi-random” approach to assigning Hispanic (monolingual Spanish or English, or bilingual) walk-in patients or those with appointments to diagnostic interviews regardless of clinicians’ ethnicity or Spanish abilities. Assignment was based on clinic appointment schedule, on-call, or walk-in schedule. When non-Hispanic clinicians encountered a Spanish-speaking patient for the live interview or watched a Spanish-language videotaped interview, interpretation was provided by the SCID administrator (a bilingual, bicultural clinically experienced master’s degree-level psychologist). The SCID was administered after the live interview in every case except one. After conducting live interviews or watching videotaped interviews, clinicians completed questionnaires requiring diagnoses and responses to both quantitative and qualitative questions. Clinicians were instructed to rank-order up to three diagnoses in Axes I and II. In all instances, one clinician was Hispanic by self-identification and one was non-Hispanic by self-identification, regardless of whether they conducted live diagnostic interviews or watched the videotapes. In this report, we present findings based on Axis I (clinical disorders and other conditions that may be the focus of clinical attention) and V (global assessment of functioning), and symptom severity data.

Data Analyses

Potential differences between Hispanic and non-Hispanic clinicians on basic demographic and professional characteristics were assessed via independent samples t-tests. To examine diagnostic reliability between SCID, Hispanic and non-Hispanic clinicians, we computed both percentage agreements (the most basic form of diagnostic reliability) and kappa statistics for broad levels of disorders (e.g., mood disorders, substance-related disorders) and narrow diagnostic levels (e.g., dysthymic disorder, alcohol use disorder). For clinical samples with higher base rates of disorders than the general population, kappa statistics are an appropriate measure of diagnostic reliability (Langenbucher et al. 1996; Spitznagel and Helzer 1985). To determine whether the number of diagnoses rendered by SCID, Hispanic and non-Hispanic clinicians differed significantly, we computed difference chi-square and McNemar chi-square statistics. The difference chi-square assesses whether the distribution of obtained responses differs from that expected by chance, while the McNemar chi-square evaluates whether one “diagnostician” (i.e., SCID, Hispanic, or non-Hispanic clinician) diagnosed significantly more or less than another. Finally, we used general linear modeling to compare mean differences between Hispanic and non-Hispanic clinicians’ ratings of patients’ symptom severity and global assessment of functioning while controlling for interview language and live versus video condition.

Results

We identified 150 eligible patients, 96 of whom agreed to participate (64%). The most frequent reason given by the 54 who declined to participate was concern about the videotaping of interviews. Independent samples t-tests revealed that refusers did not differ significantly on any demographic characteristic (e.g., age and gender) from participants. Eight of the 96 who agreed to participate (seven men, one woman) failed to demonstrate capacity to consent on a brief screener developed for this research protocol, mostly from apparent cognitive deficits.

Eighty-eight patients (57% male) were enrolled in the study after providing informed consent. Their average age was 41 years (SD = 13, range 18–83 years) and they were mostly of Dominican (36%) or Puerto Rican (22%) descent, the two largest Hispanic groups in the local community. Mexicans and Ecuadorians each comprised 7% of the sample and those of other Latin American ancestry were 26%. Most patients (81%) had a high school education or more and 91% were US citizens or legal aliens.

Forty-seven clinicians volunteered to participate in the study and gave informed consent. Most were psychiatrists (40%) and psychiatric social workers (40%). Two thirds of clinicians were non-Hispanic of any race and twice as many participating clinicians were females (68%). Clinicians averaged 10 years of experience (SD = 8.7) in adult psychiatric practice. The average years in practice ranged from 4 years among psychiatric residents to 12 years among social workers to 19 years among psychologists. The only significant difference between Hispanic and non-Hispanic clinicians was the average years of adult psychiatric practice, with Hispanic psychiatrists having more than double the experience of non-Hispanic psychiatrists (P = 0.013).

Our analyses used all Axis I diagnoses assigned by clinicians regardless of rank order. As is customarily found in community samples, mood disorders were most frequently diagnosed (72 of 88 or 82%; Table 1). Substance-related disorders were the next most frequent diagnoses, followed by anxiety and adjustment disorders. As expected in a community sample seeking services for the first time, schizophrenia and other psychotic disorders were least often diagnosed. In all instances except alcohol use and adjustment disorders, non-Hispanic clinicians diagnosed more patients with Axis I disorders than did Hispanic clinicians.

Table 1 Percentage of agreement between clinicians, and between clinicians and SCID

Table 1 also shows the agreement between Hispanic and non-Hispanic clinicians on patients’ Axis I disorders. Discrepancies in agreement are evident in patients who received a particular diagnosis from one clinician but not the other. The total agreement between Hispanic and non-Hispanic clinicians ranged from a low of 8% for schizophrenia and other psychotic disorders to a high of 69% for mood disorders. Disagreement was pronounced in substance-related disorders, adjustment disorder, and schizophrenia. For patients given a diagnosis of alcohol use disorders (n = 20) and generalized anxiety (n = 16), clinicians failed to agree on any patients with these disorders. The highest level of agreement between clinicians occurred in the diagnosis of substance use disorders (69%).

We compared diagnostic agreement between clinicians and SCID diagnoses on Axis I. Table 1 shows the rates at which both Hispanic and non-Hispanic clinicians agreed with the SCID on the same patients, and when the clinicians agreed with the SCID but disagreed with each other. At the high end of agreement, the SCID diagnosed 36 patients with mood disorders and both Hispanic and non-Hispanic clinicians agreed with the SCID on 21 of these patients (58%). Forty-eight patients with substance related disorders were identified by the SCID, yet both clinicians agreed with the SCID on less than half of these patients (n = 20; 42%). Clinicians agreed with the SCID on only 28% of patients with anxiety disorders and 8% with schizophrenia and other psychotic disorders.

We found that neither Hispanic nor non-Hispanic clinicians agreed with the SCID on alcohol use and psychotic disorders not otherwise specified, and only once on panic disorders. At the high end of agreement, both Hispanic and non-Hispanic clinicians agreed with the SCID on major depressive disorders half the time (50%) and they were nearly evenly split on the remaining 50% of patients diagnosed by the SCID (four by the Hispanic clinicians and SCID and five by the non-Hispanic clinicians and SCID). In diagnosing substance use disorders, Hispanic clinicians failed to agree with the SCID on a single case and non-Hispanic clinicians agreed with the SCID on six patients. Total agreement (all clinicians) with the SCID was less than half (46% of patients).

Kappa statistics assess reliability of diagnoses, with Kappas < 0.40 being poor to fair, between 0.40 and 0.80 moderate to substantial, and >0.80 exceptional (Landis and Koch 1977). As shown in Table 2, kappa statistics were overall low to moderate, ranging from −0.13 to 0.74. For SCID versus Hispanic diagnoses, kappa statistics ranged from −0.07 to 0.35. For SCID versus non-Hispanic clinician, kappa statistics ranged from −0.06 to 0.52. Negative Kappas are rare and indicate that the diagnostic reliability was even less than what was expected by chance.

Table 2 Comparisons between clinicians and between clinicians and SCID

Next, we examined whether the distribution of assigned diagnoses significantly differed from chance using difference chi-squares, and whether one clinician diagnosed at a significantly different rate than the other, using McNemar chi-squares (Table 2). When comparing mood disorder diagnoses rendered by Hispanic versus non-Hispanic clinicians, the difference chi-square was significant (P = 0.0015) and the McNemar was not. For major depressive disorder the difference chi-square was significant (P < 0.0001) and the McNemar was again non-significant. For other mood disorders (including dysthymia and depressive disorder NOS) neither chi-square was significant.

In the substance-related disorder and substance use disorder categories, the difference (P < 0.0001 for both) and McNemar (P = 0.0253, P = 0.0114, respectively) chi-squares were all significant, indicating that non-Hispanic clinicians assigned the diagnosis at higher rates than Hispanic clinicians. For alcohol use neither chi-square was significant.

For anxiety disorders, the difference chi-square was significant (P = 0.0147) and the McNemar chi-square was not. Finally, in the adjustment disorder category, once again the difference chi-square was significant (P = 0.0080) and the McNemar chi-square was not. For all remaining diagnoses, no significant differences were found between Hispanic and non-Hispanic clinicians. It is worth noting that for substance related diagnoses, the video versus live condition appears to be a confound, with all five diagnoses rendered by Hispanic clinicians from video.

For SCID versus Hispanic clinicians, kappas ranged from −0.07 for other mood disorders, to 0.41 for substance use disorders. Difference chi-squares were significant in the major depressive disorders, anxiety disorders, schizophrenia and other psychotic disorders, and all substance-related categories. McNemar chi-squares were significant for the mood disorders, adjustment disorders, schizophrenia and other psychotic disorders, panic disorder, and all substance-related categories, indicating a significant difference in the rate of diagnosing. In every case, the SCID diagnosed more disorders than the Hispanic clinicians.

For SCID versus non-Hispanic clinicians, kappas ranged from −0.06 for mood disorders, to 0.52 for substance use disorders. Difference chi-squares were significant in the anxiety disorders and substance-related categories. McNemar chi-squares were significant for mood disorders, major depressive disorders, panic disorder, and all substance-related categories. In every case, the SCID diagnosed more disorders than non-Hispanic clinicians.

Finally, we compared Hispanic and non-Hispanic clinicians’ ratings of patients’ symptom severity and their ratings of global assessment of functioning while controlling for language of interview and live vs. video condition (Table 3). On the BPRS, non-Hispanic clinicians rated patients as showing more severe symptoms (< 0.0003). On global assessment of functioning, non-Hispanic clinicians assessed patients as significantly lower in current functioning (< 0.0001) and functioning in the past year (P = 0.0014). Neither interview language nor condition (live versus video) made a difference in clinician ratings.

Table 3 Clinician ratings of patients’ symptom severity and global assessment of functioning

Discussion

Hispanic and non-Hispanic clinicians differed significantly in the diagnoses they assigned Hispanic outpatients and demonstrated low-levels of agreement; sometimes they assigned similar diagnoses but not to the same persons. Clinician agreement with the SCID also varied in pronounced ways. If a three-way agreement (i.e., both clinicians and SCID) more closely approximates an accurate diagnosis, then this was not supported for diagnostic categories that are very common to community mental health practice (e.g., mood disorders, substance related disorders, alcohol use disorders, substance use disorders and adjustment disorders). Thus, for instance, it was alarming when the SCID diagnosed alcohol abuse in 34 patients but there was not a single case in which both clinicians agreed with the SCID. And when the clinicians did agree with the SCID, their diagnoses were completely divergent: Hispanic clinicians agreed with the SCID on eight patients and non-Hispanics agreed with the SCID on five entirely different patients. Hispanic and non-Hispanic clinicians agreed with the SCID on 46% of patients with substance abuse disorders; non-Hispanic clinicians identified six additional persons with these disorders. Significant disagreement was also evident in clinicians’ ratings of patients’ symptoms and assessment of patients’ functional capacity for both the current period and past year. This is also at odds with past research on Hispanic patients, which shows that Hispanic clinicians rate symptoms of Hispanic patients more severely than do non-Hispanic clinicians (Malgady and Costantino 1998). But sampling differences in the two studies may account for these findings.

These findings point to our concern about diagnostic agreement across clinicians and instruments. They also run contrary to the assumption that consumers should expect accuracy and consistency in their diagnosis regardless of clinician. Such findings raise questions about what information clinicians rely on to reach their diagnostic conclusions (Neighbors et al. 2003; Strakowski et al. 1996; Trierweiler et al. 2000). Our results seem to concur with past findings that clinicians may be influenced by different factors and make attributions of pathology differently, based possibly on clinicians’ cultural and social biases, and assessing its magnitude in very distinct ways (Malgady and Costantino 1998; Neighbors et al. 2003; Strakowski et al. 1996; Trierweiler et al. 2000). As Minsky et al. (2003) point out, bias may be present when clinicians apply DSM-IV criteria to patients differently. As evident in previous research and in our study, diagnostic bias (if considered a factor in our results) is very complex and does not easily follow ethnic lines. Bias may intrude because clinicians may not use diagnostic criteria effectively due to cultural variances in characteristic symptom clusters that clinicians use as “a template for assigning a diagnosis” (Minsky et al. 2003, p. 643). These symptom clusters may be confounded by ambiguous signs, body language, and verbal idiosyncrasies that interfere with appropriate applications of DSM-IV criteria. Some researchers (Minsky et al. 2003; Strakowski et al. 1997) have suggested that structured interviews may attenuate race-related diagnostic differences and explain clinician disagreement or error as the result of ambiguities in the patient-clinician interaction. As Strakowski et al. (1997) show, diagnostic agreement between clinicians and structured research interview occurs less often when the patients are non-White minority persons than White persons.

Limitations restrict the conclusiveness and generalizability of our findings. Both our clinician and patient samples were small and patient refusal rate (36%) was high. Ours was a self-selected, non-representative sample of Hispanics. Without comparison groups it is impossible to know whether disagreement between clinicians of different ethnicities and between clinicians and SCID would have been any better for other ethnic patients. Our clinician sample too may be biased by more experienced Hispanic clinicians, and the small number of clinicians prevented comparisons of agreement levels of same-ethnicity clinicians or by language, discipline, and gender. Imbalances also existed in live and video interview conditions, with Hispanics conducting most of the live interviews (70%). Diagnosing from video, of course, is not customary practice, and clinicians viewing tapes were constrained by the questions that were asked on tape. Additionally it is clear that the use of videos also introduced additional confounds into both findings and accurate interpretation. Finally, because of the pilot study nature and the concomitant restrictions in budget, we used one interpreter for non-Hispanic, English-only clinicians which introduced a level of potential uncontrolled and unexamined bias. The interpreter was also our SCID administrator, potentially introducing systematic bias that could be reduced when using more persons and separating the roles.

This pilot study is, nonetheless, one of a relatively few recent-day examinations of clinical diagnostic practice with Hispanics. It also introduced two objective measures to examine clinician diagnostic judgment. The use of a community sample based within the real-world, real-time context of a community mental health clinic enhanced its ecological validity, albeit with the limitations noted above. Except for having second clinicians assess patients from videotaped interviews, the study followed a relatively naturalistic design that reflected the everyday practice of a busy urban clinic: assigning patients to clinicians based on schedules and staffing patterns and not by segregating patients a priori by diagnosis. Most community settings have neither the time nor the trained personnel to administer structured diagnostic instruments as part of their intake assessments. In this regard, our assessment of clinicians’ agreement on diagnoses reflects community psychiatric practice and points to the need for more training in and use of structured diagnostic interviews in community mental health practice.

Adding to a body of diagnostic research on Hispanics, our study points to the complexity of the problem and to challenges to future research in this area that might help make sense of the conundrum. This study makes the case for looking at how objective diagnostic measures impact agreement and accuracy. Therefore, we need research that uses larger, representative multiethnic samples of both patients and clinicians in sufficient numbers to provide enough statistical power to explore where the highest and lowest levels of agreement exist, and how we can understand the differences. Larger samples can allow for various comparisons by patient and clinician gender and ethnicities, and between clinicians by discipline, gender, years of experience, and so on. Using in vivo diagnostic interviews for all clinicians’ ratings overcomes the methodological confound introduced by videotaped interviews. Multiple SCID raters will attenuate the bias that can come from having a single rater. Comparison groups of ethnic minority and non-minority patients will answer a critical question of whether lack of agreement among our clinicians and the SCID was unique to diagnosing Hispanic patients or would have existed at the same level of disagreement regardless of ethnic differences.

Clearly, we can deduce that usual-care diagnoses in a typical community mental health center are very unreliable. The immediate concern becomes what happens to patients when we link diagnosis to treatment decisions. How diagnoses influence the treatment, including pharmacological interventions, instituted by Hispanic and non-Hispanic clinicians assessing Hispanic patients, and the outcomes of these treatment decisions, are of the utmost importance. Evidence-based therapies are, after all, predicated on accurate diagnoses.