BACKGROUND

Currently, more than 47 million people in the US speak a language other than English at home, and 19 million have limited English proficiency (LEP)1. The rising number of US residents whose English language skills are limited or lacking challenges the existing ways in which medical facilities approach and deliver services. In patient care, verbal communication is critical to establishing relationships and enabling understanding between providers and patients,25 in particular when patients need to express health concerns. Barriers in communication can jeopardize the safety and quality of care, potentially leading to significant misdiagnosis and inappropriate treatment2 , 510.

In one study, communication problems contributed to adverse events at a much higher rate among LEP patients (52.4%) than English speakers (35.9%), with 49.1% of adverse events rising to the level of clinical significance in LEP populations, compared to 29.5% in English-speaking populations.3 Many studies have shown that LEP patients tend to have lower rates of preventive screening and higher rates of hospitalization and drug complications2 , 3 , 9 , 1115. Lack of discussion and poor understanding of treatment plans, including medication side effects, are likely reasons for these concerning rates, as such factors can lead to patient dissatisfaction and reduce adherence to physician recommendations1618. The first systematic review of the impact of medical interpretation on health care concluded that quality of care and related health outcomes are seriously compromised for LEP patients who need but do not receive qualified interpretation services13.

Interpreters are clearly vital for health communication2 , 5 , 8 , 10 , 13 , 19. Title VI of the Civil Rights Act of 1964 requires health care providers who receive federal financial assistance, including Medicaid and Medicare, to provide language assistance at no cost to patients. In practice, this assistance often involves ad hoc interpreters, such as family members, volunteers, or medical staff without training in interpretation. Nevertheless, the US Department of Health and Human Services (DHHS) and the Institute of Medicine favor even more stringent requirements, stating that the standard of care should include the availability of trained interpreters20 , 21. In recommending national standards for competency in language assistance, DHHS asks health care organizations to ensure that interpreters achieve proficiency in English and the target language, complete formal training, and comply with ongoing quality assurance22. Several organizations, notably the Joint Commission on Accreditation of Healthcare Organizations, the National Council on Interpreting in Health Care, and the International Medical Interpreters Association, are currently developing national standards,2325 and a national certification program has been suggested26.

Despite this enthusiasm for standards, the literature contains few studies of the quality and accuracy of medical interpretation, or of the potential impact on medical care (positive or negative) of alterations in meaning made by interpreters in translating patients’ or providers’ remarks5 , 13 , 15. Most research that directly analyzed interpreted medical encounters has involved ad hoc interpreters13 , 19 , 27 , 28. In one of the rare studies to include professional interpreters, Flores et al. found that the average rate of alteration was 31 per encounter, with more than half of the alterations having the potential to affect clinical outcomes2. In another study analyzing family conferences for gravely ill patients, Pham et al. found that alterations by professional interpreters occurred in every interpreted conference that they observed. Fully 55% of “interpreted exchanges” contained interpreter alterations, of which 78% potentially had clinically significant consequences29.

Finding and transmitting equivalent meanings across language and culture requires a high level of skill, since small changes in words can result in large changes in meaning. Awareness of the most common types of alterations, as well as the alterations most likely to hinder communication, is critical to designing training programs for interpreters and clinicians that can help reduce health disparities for LEP patients30.

OBJECTIVE

We set out to understand how alterations in medical interpretation affect health care delivery to LEP patients by documenting the rates and types of alterations and determining their clinical significance. One major goal was to identify a baseline standard of interpretive accuracy, which has not yet been defined. Thus, we intentionally studied encounters under the most favorable circumstances in which trained interpreters interacted with established patients receiving routine care.

DESIGN

This study was conducted from November 2007 to June 2008 at an outpatient clinic providing primary care to poor, non-English speaking immigrants and refugees in a large urban medical center in the Pacific Northwest. Many clinic patients had emigrated from war-torn regions and had histories of physical and emotional trauma. All study methods were approved by the center’s institutional review board, and all study participants provided written informed consent. Consent forms were prepared in English, translated into our target languages, approved by institutional review, and read aloud to each patient by the interpreter before the clinical encounter. Printed forms were also available to participants in their native languages.

The primary method of data collection was audio-recording of clinical encounters that included trained medical interpretation. Our target languages were Cantonese, Mandarin, Somali, Spanish, and Vietnamese, which are the most commonly spoken non-English languages in our patient sample. We included patients receiving care for diabetes, hypertension, hyperlipidemia, and other routine chronic conditions. We excluded patients with a diagnosis of cancer, sexually transmitted disease, or tuberculosis, as well as patients with psychiatric illness and those dealing with domestic violence or end-of-life issues. We wished to capture a best case scenario: a professionally trained and certified interpreter, discussing a familiar topic with an established physician/patient dyad. Therefore, we focused on encounters in which no new diagnostic information was imparted and no emotionally laden topics were discussed, because these might be more challenging to interpret.

All recorded encounters were translated into English by trained interpreters who did not participate in the clinical sessions. Transcripts were reviewed by study investigators, and each meaningful linguistic alteration with medical significance was classified into one of four codes based on the work of Flores and colleagues13. The codes were: (1) addition, (2) deletion, (3) change of meaning, and (4) editorialization. Addition means adding words not uttered by the patient or provider. Deletion means omitting words in the original utterance from the interpretation. Change of meaning refers to replacing words or phrases uttered by the patient or provider with words or phrases that carry different meanings. Editorialization refers to the offering by an interpreter of an opinion not expressed by patient or provider.

Coding was directed by the lead investigator (DN), with clinical adjudication by two co-investigators who are general internists (JCJ, GST). Ultimately each transcript was coded separately by at least one physician and one research associate. We used Atlas.ti, a software program for qualitative analysis, to identify and assign codes to emergent themes in the transcripts. We defined clinically significant alterations as non-trivial changes in the information exchanged by patient or provider that had a potential impact on clinical outcomes—for example, by affecting medical history, diagnosis, treatment plan, or patient education. Clinical significance was determined by a review of study transcripts by our two internists. For quality control, two encounters were coded separately by each internist to assess concordance, and in these cases we observed complete agreement.

Clinically significant alterations were further classified as either positive or negative. Positive alterations contributed to a better understanding of the medical condition or situation; negative alterations contributed to a misunderstanding between patient and physician, and could lead to a missed or incorrect diagnosis or treatment.

PARTICIPANTS

Providers

We introduced the project in meetings with clinic physicians and subsequently distributed letters of introduction and consent forms. We included only attending-level physicians in adult primary care who had established clinical relationships with their patients. Five physicians met our inclusion criteria and provided consent. Only one was fluent in a second language.

Interpreters

We sent letters of introduction and consent forms seeking the participation of in-person interpreters for each language. We recruited 3-5 interpreters per language except Mandarin, for which we were able to recruit only 1, resulting in a total of 16 participating interpreters. Each interpreter was recorded one to three times. All interpreters were employees of the medical center with at least 5 years of service. As a condition of employment, all had passed spoken and written evaluations in their target language, conducted by the State of Washington’s DHHS, and completed a 40-h course in professional interpretation. All interpreters at the medical center receive routine weekly supervision and monthly continuing medical education courses.

Patients

Each week we asked participating providers to identify qualified patients and explain the project to them. In this way we recruited 42 patients, audio-recording each patient only once.

Encounters

We limited our study to routine clinical interactions, excluding any encounters where new symptoms were presented or difficult conversations were planned. No telephone interpretations were used.

MEASURES

We departed from previously published methods that report alterations per interpreted encounter. Instead, our analyses focused on each unit of spoken content presented to the interpreter for translation, whether it was spoken by the provider or by the patient. We called this unit of analysis an “utterance,” as it varied in length from a single word or phrase to a few sentences. Each utterance can be clearly identified in the transcripts of the interpreted encounters as the unit of work presented to the interpreter. The length of an utterance was not controlled by the interpreter, except on rare occasions to clarify meaning. We argue that this approach leads to a more granular investigation by making the unit of analysis equal to the unit of work; it increases analytic precision and enables a comparison of alteration rates across languages. Table 1 contains examples of utterances and their coding.

Table 1 Examples of Utterances Containing Clinically Significant Alterations

KEY RESULTS

Thirty-eight of the 42 recorded clinic visits, each lasting 12–17 min, were included in the final analyses. Four visits were excluded because of technical problems with audiotapes. None of the visits strayed into emotionally charged or otherwise non-routine content. Visits that included Vietnamese and Cantonese speakers were over-represented, while Mandarin and Somali speakers were under-represented. The refusal rate was much higher among Somali patients (50%) than among other language groups (less than 10%). The average number of utterances per visit is recorded in Table 2, ranging from 93.8 for Cantonese speakers to 174.6 for Somali speakers. All alteration rates were calculated by dividing the number of alterations by the number of utterances for that encounter. Although the gross alteration rate per utterance ranged widely, overall we found that 31% of all utterances during a routine clinical encounter contained an alteration.

Table 2 Rates of Alteration per Utterance in Routine Primary Care Visits, by Language

We classified alterations according to the four categories defined above (see Table 1). Table 3 summarizes rates per utterance for each one. Deletions were the most common alteration, ranging from 10% to 20% of all utterances, for an overall rate of 16%. Editorializing was the least common, occurring in about 2% of utterances. Table 4 summarizes the overall rates of clinically significant alterations.

Table 3 Rates of Each Category of Alteration per Utterance in Routine Primary Care Visits, by Language*
Table 4 Rates of Clinically Significant Changes per Utterance in Routine Primary Care Visits, by Language*

This excerpt from a typical encounter illustrates an utterance whose real-time interpretation contained multiple clinically significant changes:

Patient [translated]: I’m okay but there is something in my eye, like eye gooey and it itches.

Interpreter to physician [English]: I’m alright but my eyes are little bit itchy… and the eyes are little blurry and a little distressed on the eye.

In this case, the interpreter deleted the patient’s report of eye discharge while adding a report of reduction in visual acuity and changing the report of itching to “distress.” Despite their apparent subtlety, such alterations might have a negative effect on the differential diagnosis.

Because we acknowledge that interpreters can change a message in helpful directions, we also evaluated clinically significant alterations to see whether they aided communication. Most of the beneficial alterations that we found involved adding information to enhance physician instructions, such as the location of a laboratory or ways to expedite medication refills.

In a more complex encounter involving a patient with previous cataract surgery, the provider recommended a second surgery for the contralateral eye. The interpreter translated the provider’s recommendation verbatim and then added an editorializing statement in English: “The problem is funding. I was with her when she went to the follow-up for surgery and … they suggest to try talk to the social worker and see if she can apply for a basic health or something like that.” This statement was coded as an interpreter alteration with positive clinical significance. Overall, however, only 5% of alterations per utterance were clinically significant, with 1% having positive effects and 4% having negative effects.

We used analysis of variance to examine alteration rates per utterance across languages. Then, to confirm suspected differences, we used pairwise comparisons with the Bonferroni correction to compare every possible pair of languages on rates of each alteration type. Table 5 summarizes these comparisons. For example, Somali interpreters changed messages more often and editorialized more often, in a positive clinically significant direction, than did Cantonese and Vietnamese interpreters. Behind such differences may lie cultural, linguistic, and historical factors that our study was not designed to address; alternatively, variation in interpreter skill may be the likeliest explanation.

Table 5 Significant Alterations Between Language Pairs Using Pairwise Comparisons and the Bonferroni Correction

CONCLUSIONS

Flores et al. found that interpreters made clinically significant alterations at alarmingly high rates2 , 13. Although their study also concluded that trained medical interpreters were less prone to such alterations than were ad hoc interpreters, their analyses nevertheless yielded rates of 53% and 77%, respectively, for trained vs. untrained interpreters. Flores et al. calculated alteration rates by summing the number of phrases interpreted over all encounters by a single interpreter (either professional or ad hoc),2 whereas we calculated alterations per utterance per encounter. Using this method, we found a rate of clinically significant alterations (5%) that was smaller by an order of magnitude than the rate found by Flores and colleagues, even for trained interpreters. The likely reasons behind these strikingly different findings are rooted in our intentional focus on a best case scenario with a professionally trained interpreter addressing a familiar topic with an established physician/patient dyad. Under these favorable circumstances, we established that about one-fifth of clinically significant changes actually enhanced rather than impeded communication between patient and physician.

Notably, an earlier study conducted at the same medical center reported that clinically significant changes comprised 78% of all interpreter alterations during family conferences in the intensive care unit29. These encounters arguably represented a worst case scenario, as they involved complex medical cases, emotionally charged discussions, morbidly ill patients who were previously unknown to the provider, and interpreters provided by an agency rather than employed by the medical center.

Like Flores et al., we found that deletion was the most common alteration, even though the rates we observed (16%) were much lower than the ones they reported (51%)2. Because we included five languages other than English, whereas their study addressed only Spanish, we were able to observe that overall rates of alteration varied by language, from 22% in Spanish-language encounters to 35% in Vietnamese-language encounters. Further study is required for an adequate understanding of this discrepancy.

While it is unlikely that the interaction among interpreter, provider, patient, and language can be completely teased apart, this complex relationship undoubtedly accounts for some of the significant differences between groups that we saw in pairwise comparisons. We conclude that professional training programs for interpreters and providers should address the widespread tendency for interpreters to omit details from their interpretations. This tendency might be remedied by training providers to deliver brief, clearly phrased utterances, and by teaching interpreters methods for remembering the number of key points in an utterance and for requesting clarification from providers when in doubt. Along with other studies, we agree that training interpreters and clinicians to address common patterns of alteration will markedly raise the quality of communication between providers and LEP patients13 , 29.

Limitations

Our findings should be interpreted in light of several limitations. First, although we included five interpreted languages, we recorded relatively few encounters and interpreters in each language, reducing our statistical power. Second, our inability to randomly assign interpreters, physicians, and patients to encounters may have a confounding effect on our calculation of alteration rates by language.

Third, our small sample size, combined with the lack of randomization, limits our ability to use a mixed-effects model to isolate interpreter effects from effects of topic, language, and provider/patient interaction. Nevertheless, we recorded 38 encounters, with a minimum of 5 encounters in each language except Mandarin. We therefore included Mandarin in our descriptive statistics, including the overall alteration rate, but removed it from comparative analyses, because only one interpreter participated in all three encounters.

Finally, all recorded encounters addressed a very restricted range of topics in an established continuity relationship. Our best case approach substantially underestimates the likely rate of alterations in more dynamic encounters—for example, those involving a new patient, a new diagnosis, or a new workup.

Significance

This study is significant for several reasons. First, our sample of interpreted encounters (38) is larger than in any previous investigation of interpreter alterations2 , 27 , 29 , 31 , 32. Second, we included five different languages representing four distinct linguistic families. Thus, our finding that the rate of clinically significant alterations remains essentially stable across languages assumes a special importance.

Third, we introduced a methodological refinement to this area of inquiry by calculating rates of alteration per utterance per encounter, instead of per total number of encounters. While the latter approach pools diverse phrases, topics, interpreters, physicians, and patients, we argue that our approach restricts measured alterations to the basic triad of interpreter, physician, and patient, enabling us to better isolate the interpreter effect from other potential variables.

Finally, our study design enabled us to calculate a baseline alteration rate that avoids potential bias introduced by variations in interpreters’ training and experience or by emotionally charged provider/patient exchanges.

Like Pham and colleagues, we remain open to the possibility that interpreter alterations may clarify potentially faulty communication29. Before designing interventions to improve interpreted encounters, it is advisable to establish baseline rates of alterations, both positive and negative, in key clinical settings: for example, end of life, new life-threatening diagnoses, and informed consent for major surgery. Such baseline variations in speech patterns, flagged by language and culture, can then be compared to rates of miscommunication in English-only encounters. This study represents a first step toward establishing such a baseline.