Introduction

The spoken word in human communication is typically elaborated, commented upon, contradicted, and embellished with nonverbal messages including facial expressions, body movements, and the tone and quality of voice (DiMatteo et al. 1986; Harrigan et al. 1985; Koss and Rosenthal 1997; Ray and Ray 1990; Riggio and Feldman 2005). Tone of voice is significant for the “meta-messages” it conveys (Robinson 1998; Street and Buller 1987). The emotional state of speakers can often be understood even when we cannot comprehend their words. Certain emotional expressions (i.e., happiness, sadness, fear, disgust, surprise, and anger) are universal, and even infants respond to voice tone (DePaulo and Friedman 1998; Ekman 1989; Grossman et al. 2005). It is through vocal expressiveness (encoding) and sensitivity to voice tone (decoding) that the subtle elements of communication affecting emotional experience are conveyed (Choi et al. 2005).

The present research examines human voice tone in dyadic interaction, specifically that which takes place between health professionals and their patients in the primary care medical visit. This research: (1) develops a digitally-based method for filtering content from the spoken voice, and both reliable and valid global affect rating scales for the assessment of content-filtered voice tone, (2) examines the relationship between a speaker’s vocal tone and the interpersonal satisfaction of both speaker and receiver of vocal communication, and (3) assesses the relationship between a speaker’s voice tone and characteristics (e.g., physical and mental health status) of the receiver of the message, as well as the degree of concordance between the voice tones of two interactants in conversation. Two studies are presented, based on two data sets involving nurses and physicians interacting with medical patients in primary care medical visits. This research is the first to examine the relationships of nurses’ voice tone to satisfaction outcomes, and of physician voice tone to previously unstudied health outcomes.

Content-filtered Speech

Research on voice tone in human speech requires its isolation, typically achieved through a process called “content-filtering” which is “a research procedure that isolates the paralinguistic channel of communication by eliminating or controlling for semantic content in the verbal or linguistic channel” (Rogers et al. 1971, p. 16). Content-filtered (CF) voice tone (also referred to as “content-filtered speech”, “content-free speech”, “content-masked speech”, and “low-pass filtered speech”) sounds muffled, as though heard through a wall. The affective quality of the speech remains, but the semantic meaning is removed and words are indistinct (Starkweather 1956a, b). Content-filtering renders the verbal content of speech unintelligible because it removes the highest and lowest frequencies of the voice, which tend to communicate the consonants and vowels respectively (Soskin and Kauffman 1961). Throughout the history of content-filtering research, the precise mechanisms and methodologies have varied but their common goal has been the retention of vocal quality while rendering the verbal content of speech unintelligible.

Although the focus of the present study involves health professionals and their patients, it is important to note that affective ratings of CF voice tone have been examined in many interpersonal and relational contexts (Ambady and Rosenthal 1992; Robinson 2006). In these studies, content-filtered speech is rated by judges on a collection of adjective descriptors, using rating scales that are either bipolar (e.g., submissive-dominant, pleasant-unpleasant) or unipolar (e.g., friendliness (low to high)) (Rosenthal et al. 1984).

Voice Tone as a Predictor of Patient Outcomes

Voice is a central component of the study of nonverbal communication in the clinician–patient relationship (Hall et al. 1995). In the earliest outcome study in the medical setting, Milmoe et al. (1967) were able to predict physicians’ success at referring alcoholics for treatment from the rated level of anger in their CF voice tones. In psychotherapy, clinicians’ voice tones of warmth, hostility, empathy, and anxiety when speaking about their patients predicted their voice tones when speaking to their patients (Rosenthal et al. 1984). Physicians’ accuracy at encoding messages of specific emotion in their voice tone correlated positively with patient satisfaction with their care (DiMatteo et al. 1980). Physicians’ skill at decoding content-filtered voice tone predicted both patient satisfaction and patient adherence to scheduled appointments (DiMatteo et al. 1986). Hall et al. (1981) found that when physicians sounded anxious in their voice tone, thus communicating concern, their patients were more satisfied, and physicians who were task-focused had more anxious and interested content-filtered vocal affect than did those who were socio-emotionally focused (Hall et al. 1987). The relationship of malpractice litigation to voice tone has been studied using short clips of physicians’ content-filtered speech; surgeons who had been sued conveyed more dominance in their voice tone than did those who had not (Ambady et al. 2002). Thus, there is some empirical evidence that CF voice tone (which initially seems on the surface to provide little information on which to make judgments) has predictive power for medical care outcomes including malpractice claims, patient satisfaction, and patient adherence to recommended regimens. This research has been somewhat limited, however, in the patient outcomes studied.

Encoding of Emotional Messages and Clinician–Patient Rapport

Through vocal expressiveness and sensitivity to voice tone (respectively skill in encoding and decoding), the subtle elements of emotional communication in the physician–patient relationship are conveyed (Rosenthal et al. 1979). Little research has examined nonverbal skill in the context of voice tone, although there is evidence that emotions (particularly fear and anger) can be communicated accurately, particularly through vocal pitch (Davitz 1964). The expression of positive emotion of physicians and how that may be reflected in their satisfaction, for example, is explored in this study.

Nonverbal reciprocity, closely tied to the concept of rapport, has been studied in the therapeutic context (Hall et al. 1995; Harrigan et al. 1985). In the early stages of development of a therapeutic relationship, individuals display nonverbal cues of cooperation, and respond emotionally and attentively to one another (Tickle-Degnen and Gavett 2003). In health care settings, providers and patients can influence one another with their nonverbal behaviors (Roter et al. 2006). Using nonverbal affiliative cues, health care providers express their preferences for their patients’ involvement and participation in care (Street and Buller 1987); nonverbal reciprocity and rapport are related to positivity in physician–patient encounters (Koss and Rosenthal 1997). The role of voice tone in understanding the process of rapport and how patients and their providers may respond to each other has not been extensively studied, and is examined here.

Research Questions and Overview of Studies

In Study 1 we ask the following research question: (1a) Does physician content-filtered voice tone predict patient outcomes, specifically patient satisfaction and self-reported adherence to treatment? We hypothesize that positive aspects of physicians’ voice tone will indeed be related to patient outcomes. The question of whether patient characteristics can predict physician voice tone has received scant attention in the literature, and so in Study 1, we also ask this research question: (1b) Do physicians’ voice tones vary depending upon the (self-reported) mental and physical health status of their patients? And (1c) Does physician content-filtered voice tone betray the physician’s own feelings toward the patient as evidenced in the physician’s reported experience of the visit?

Past research in the medical realm has not examined the voice tone of health care providers other than physicians and psychotherapists, and so in Study 2, we address the following: (2a) What is the relationship between individuals’ (nurses’, physicians’, patients’) content-filtered voice tone and the satisfaction of both patients and nurses with the medical interaction? and (2b) Do patients and their health professionals exhibit similar or reciprocal patterns of content-filtered voice tone in their interactions with each other?

The present research involves the use and analysis of digital audio-recordings of physician–patient and nurse–patient interactions in primary care in two separate studies. All analyses were conducted at the level of the provider, employing robust random-effects analyses and allowing generalization of the findings to other providers as well as to patients. Study 1 involved 51 outpatient primary care physicians with 199 of their patients in three types of settings on the West Coast: an HMO, a Veterans’ Hospital Clinic, and a University Medical Center. Study 2 involved 34 different primary care medical practices in a West Coast HMO, where 61 physicians and 81 nurses participated in 272 independent physician–patient interactions preceded by 269 nurse–patient interactions. The two study samples provided voice recordings of their interaction as well as measures of patient satisfaction and adherence, patients’ mental and physical health status, and provider satisfaction.

Study 1

Content-Filtering Procedure

Although the principles of vocal-tone content-filtering have remained the same since their beginnings in the 1950s, implementation of these principles has become easier and potentially more widespread with improvements in audio technology. While early work relied on analog tape-recordings and analog settings of bandpass filters, more recent digital recording has allowed the use of desktop computer-based audio-editing software for increasing ease and precision of the content-filtering process.

The Adobe Audition software program (Adobe Audition 1.5, 2003–2004; based on its earlier version, Cool Edit Pro) allows precise content filtering of MPEG-3, WAV, and other digital-audio files (http://www.store.adobe.com/store/). Analog audio cassette recordings can be easily digitized using cord input from an audiotape player to the personal computer and identifying this input to the Adobe Audition program.) Adobe Audition uses the same content-filtering process as former analog machines, allowing more precise settings of vocal frequency in terms of cycles per second, hertz, and decibels. The result is more effective removal of the frequencies of the voice necessary to render the words unintelligible while retaining the extra-linguistic components of speech including intonation, rhythm, tempo, volume, timbre, and voice quality. Digital audio files of original and content-filtered speech can be saved on compact discs or external hard-drives, making their storage compact and their retrieval efficient and available for research decades in the future. The content-filtering process involves viewing the sound waves of the interaction on screen, editing the interaction to distinguish between the voices of the physician and the patient, and when one voice remains, filtering the interaction so that low and high frequencies (Hz) of semantic content are removed and words cannot be distinguished. A more detailed description of the process of audio content-filtering done with Adobe Audition 1.5 is available from the first author.

Global Affect Ratings and Criterion Measures

The assessment of CF voice tone requires judges’ ratings of the speaker’s global affective behavior. Rating, which involves an observer’s subjective opinion, contrasts with coding, which involves attempts at more “objective” assessment of the occurrence or nonoccurrence of defined behavioral events such as pauses or stutters in the voice (Cairns and Green 1979). Rating is often preferred to coding, and tends to demonstrate higher validity partly because it does not require inferences about the meaning of coded behaviors (Rosenthal 1966). Past research on experimenter effects has found that global/molar judgments and ratings (e.g., of experimenter affect) show lower reliability but higher validity than do coded measurements (e.g., number of experimenter gazes at a participant) (Rosenthal 1966, 2005). “Naïve” raters (who base ratings on their own judgments and are not trained to rate in specific ways) were employed in the present research; this approach has been found to most closely correspond to the assessments that are given by actual patients after a visit with their physician (Hall et al. 1981). For the current research, both unipolar and bipolar rating scales, based on adjective descriptors, were used to measure nonverbal affective behavior in voice tone. Past researchers have examined vocal affective cues and concluded that they tend to group into a limited number of categories (generally representing dimensions of positivity, dominance, and activity; Osgood et al. 1957) including warmth/pleasantness, anger/hostility, anxiety/nervousness, activity/energy, and potency/dominance among others (Ambady et al. 2002; Hall et al. 1981; Milmoe et al. 1967; Scherer 1986). Rating scales included a list of adjectives describing the physician or nurse based upon the raters’ perception of the content-filtered voice tone. Several criterion outcome variables were measured, including the interactants’ views of and satisfaction with their dyadic interaction.

Participants

Study 1 involved 51 physicians and 199 of their patients. The physicians (all in primary care specialties) were, on average, 34.33 years old (SD = 10.07, range: 24–70 years) and had their M.D. degree for an average of 8.08 years (SD = 9.62); 19 (37%) of the physicians were female and 30 (58.8%) were residents in training. Of the patients, 107 (54%) were female, and their average age was 50.5 years (SD = 16.16, range: 18–86 years).

Procedure and Preparation of CF Clips

As part of The Collaborative Research Outcomes Study by the Bayer Institute for Health Care Communication, physicians and patients were recruited from three medical care sites in California, according to procedures approved by the governing Institutional Review Boards. Analog audiotape recordings of each physician–patient interaction in the medical visit were made, with the informed consent of both interactants who filled out a detailed questionnaire after the medical visit. These recordings were converted to digital WAV audio format files and each was edited to isolate the physician’s voice prior to content-filtering. In order to determine the sound frequency ranges necessary to make the verbal content of the physicians’ speech unintelligible, two research assistants listened to 20 randomly chosen interactions to determine the minimum cut-off level at which such characteristics as voice tone, pitch, tempo, and volume could be distinguished but physicians’ words could not. In the process of content-filtering, one research assistant edited and filtered the interactions and another listened to make certain that words could not be distinguished. The cutoffs chosen were between 250 Hz and 445 Hz. Because of normal variations in volume that occurred in different interactions based on the voice and speech characteristics of the interactants and the location of the tape-recorder in the examining room, and because bandpass filters decrease volume, the sound waves of the entire interaction were amplified when necessary. After a content-filtered clip had been created, research assistants listened to be sure no semantic content was available. Raters were also instructed to avoid trying to discern words, and to alert the researchers to any interactions that had not been filtered sufficiently so that these could be re-filtered.

Following previous research showing that brief segments of the interaction can convey considerable information (Ambady and Rosenthal 1992; Rosenthal et al. 1979), 30-s segments of the physician’s voice were sampled from the beginning, from the middle, and from the end of each interaction. Each 30-s segment was a sequential stream of the communication from the physician with both silences and the communication of the patient removed. Overlapping conversation was dealt with by deleting simultaneous speech but retaining simultaneous laughter and back-channel communications of the physician (Duncan 1972). Thus, for 199 physician–patient interactions, 90 s of physician CF voice tone was rated.

Content-filtered Voice Ratings

Each of the 199 digitized and content-filtered segments of the physicians’ voice tone was rated by 4 female raters on 41 unipolar scales each accompanying an adjective; these were generated for the purposes of this study based upon items originally developed by one of the authors (see e.g., Ambady et al. 2002). Each is followed by a nine-point rating scale (1 = untrue to 9 = very true) as used in past research on the decoding of mixed messages (DePaulo and Rosenthal 1979). Female raters were chosen because they have generally been found to be more accurate judges of nonverbal behavior than males (Ambady et al. 1995; Hall 1978). Each CF voice segment was given an identification number, according to site, doctor, and patient. Segments were then copied onto compact discs in a randomized and counterbalanced presentation order developed uniquely for each rater using a table of random numbers to reduce potential biases and fatigue effects.

Raters were instructed to listen to the entire clip, and upon completion to rate the degree to which each of the 41 descriptors characterized the physician’s voice. Mean ratings for each physician were calculated across all of his/her physician–patient interactions. Interrater reliabilities were calculated for each of the rating scales, the mean across the raters was calculated, and a principal components analysis (with varimax rotation) of the 41 ratings was conducted. Scale scores (composite variables) were computed by averaging the rating scales within each component (see below), and these scores were correlated with measures of physician and patient satisfaction, patient health status, and patient adherence to treatment.

Outcome Measures and Patient Health

A Patient Post-Visit Questionnaire was administered to all patients after their medical visit, asking them to rate (on 5-point Likert-type scales) their satisfaction with the physician using items from the RAND Patient Satisfaction Questionnaire-18 (PSQ-18) and its modifications. These items focused on patients’ satisfaction overall and with the physician’s communication skill, as well as with time spent, information given, use of humor, and trust in the physician (Levinson 1999, unpublished manuscript; Marshall and Hays 1994). Patients were also asked about choices and control offered to them in their care (FACCT 05/00 Diabetes Quality Improvement Project (DQIP) Survey Items, NCQA 2000) and their adherence to medication recommendations (DiMatteo et al. 1993). Composites were calculated, yielding the following scale reliabilities: overall patient satisfaction (with the physician’s personal manner, communication skills, technical skills, and overall care: 4 items, Cronbach’s alpha = .98); patient choice/control (physician asks patient to take responsibility, physician asks patient to help make decisions, physician gives control over treatment decisions, physician offering patient choices in care, discussing the pros and cons of each choice, asking the patient’s opinion, and taking the patient’s preferences into account in treatment decisions: 7 items, alpha = .95); and physician informativeness (physician providing all possible information, giving test results, explaining side effects, explaining treatment alternatives, and telling what to expect: 5 items, alpha = .95). Individual items assessed patients’ ratings of their trust in the physician, their ratings of the physician’s communication skill, quality of care, use of humor, and patients’ adherence to recommended medication treatments.

Physician Satisfaction with the medical visit was assessed with a 20-item scale (Cronbach’s alpha = .89) (Suchman et al. 1993). The following composite measures were also computed as recommended by Suchman and colleagues: doctor–patient relationship (4 items, alpha = .74); data collection (3 items, alpha = .58); time spent well (3 items, alpha = .77); and patient not demanding excessive time (3 items; alpha = .79).

The RAND SF-36 (Stewart and Ware 1992) provided data on patients’ self-reported mental and physical health status as assessed by their symptoms in the past 4 weeks. Composite variables were computed as follows: poor mental health status (reversed RAND MHI-5; Ware 1993) (5 items, alpha = .78), depressed (2 depression items and the reverse of an item assessing happiness from MHI-5, alpha = .80), nervous (1 item assessing nervousness and the reverse of 1 item assessing calmness from MHI-5, alpha = .31), poor physical health status (2 items, alpha = .80), and pain severity and limitation (2 items, alpha = .81). One item measured limitations due to physical and emotional health problems.

Interrater Reliabilities, Principal Components Analyses, and Scale Reliabilities of Global Affect Ratings

We computed Cronbach’s alpha to assess the interrater reliability across 4 raters (as if items) for each of the 41 adjective rating items. Thus, the alpha coefficient for each item represents its interrater reliability (agreement among 4 raters); these 41 alphas ranged from .08 to .66, with a mean of .41 (SD = .16). The mean ratings (averaged across 4 raters) for each of 41 items were then subjected to principal components analysis with varimax orthogonal rotation, yielding four CF voice tone components which accounted for 70.1% of the variance. The four composite scales (not individual rating items) were used in all analyses; they were created by calculating the mean of the items falling into each component and were as follows (with number of items and Cronbach’s alpha scale reliability): (1) Warm/Supportive (14 items: accepting, active, aloof (rev.), attentive, condescending (rev.), friendly, indifferent (rev.), infantilizing (rev.), patient, relaxed, respectful, satisfied, supportive, warm; scale alpha = .85), (2) Competent/Interested (8 items: angry (rev.), competent, empathic, engaged, genuine, interested, nervous, professional; scale alpha = .80), (3) Hostile/Disrespectful (11 items: abrupt, anxious, assertive, businesslike, disrespectful, dominant, hostile, passive, pessimistic, tense, withdrawn; scale alpha = .82), and (4) Enthusiastic (8 items: caring, egalitarian, enthusiastic, expressive, honest, likeable, optimistic, sympathetic; scale alpha = .91).

Results

The research questions of Study 1 addressed: (1a) whether physicians’ CF voice predicted their patients’ satisfaction and self-reported adherence to treatment, (1b) whether physicians’ CF voice tone varied with the mental and physical health status of their patients, and (1c) whether physicians’ CF voice tone “leaked” their own feelings toward the patient as measured by physician’s ratings of the patient and the visit.

Correlational Analyses: Research Question 1a

The composite scale ratings of physician CF voice tone predicted patient-rated satisfaction and adherence. Table 1 shows that when the physician’s tone of voice was Warm/Supportive and Competent/Interested, patients reported that they were given more choices/control, were more satisfied with their physician’s communication, felt they were given more information, and had greater trust in the physician. When the physician’s tone of voice was more Enthusiastic, patients similarly had greater perceptions of choice/control and trust. The more Hostile/Disrespectful the physician’s CF voice tone was rated, the less patients reported that they received information, felt their concerns were addressed, and assessed the physician’s use of humor to be appropriate. Patients’ self-reported medication adherence was positively correlated with physicians’ Enthusiastic, Warm/Supportive, and Competent/Interested tone of voice.

Table 1 Correlations between physicians’ content-filtered speech ratings and patient and physician outcomes (study 1)

Research Question 1b

Physicians’ CF voice tone was related to their patients’ mental and physical health status. Table 1 also presents the relationship between the composites of physician CF voice tone with assessments of patients’ mental and physical health status. Physicians whose patients reported being more nervous and having poorer overall mental health encoded in their CF voice tone greater Warmth/Supportiveness, Competence/Interest, and Enthusiasm. Physicians’ whose patients reported being in more pain, however, had less Warm/Supportive, Competent/Interested, and Enthusiastic voice tone. Physicians’ whose patients had poorer health, more limitations due to physical and emotional health problems, and more severe and limiting pain had voices rated as more Hostile/Disrespectful.

Research Question 1c

Physicians’ perceptions and experiences of the medical visit tended to be expressed through their voice tone. Specifically, physicians’ overall satisfaction and their satisfaction with the physician–patient relationship, use of time, and lack of patient demands were positively correlated with the physicians’ Warm/Supportive voice tone (See Table 1).

Study 1 Discussion

Study 1 examined various outcomes in relationship to ratings of physicians’ content-filtered speech. In general, more positive aspects of voice tone were associated with positive patient outcomes in accordance with study hypotheses, and patients’ mental and physical health predicted physicians’ voice tone. Specifically, greater negativity in physicians’ voice tone was associated with greater pain and poorer physical health in their patients. More positive physician voice tone was associated with higher levels of nervousness in patients. While the direction of causality in these findings remains unclear because of their correlational nature, it is possible that physicians’ frustration and/or lack of confidence in treating patients who are ill and in pain is expressed through their vocal tone. As expected, physicians’ voice tone reflected their satisfaction with various aspects of their experience with patients.

Study 2

Participants

Study 2 involved 61 primary care outpatient physicians and 81 nurses. (In the present study, the term “nurse” is used to refer to registered nurse, licensed vocational nurse, or medical assistant whose interaction with the patient in the outpatient medical office preceded that of the physician, and who took a brief medical history and assessed the patient’s weight and vital signs). There were 272 independent dyadic videotaped interactions of patients with a physician; 269 of these were preceded by a videotaped interaction with a nurse in the physician’s primary care office practice. The sample of physicians was 34% female and the sample of nurses was 95% female. The sample of patients was 63% female and their average age was 45.27 years (SD = 16.8, Range: 15–86).

Procedure and Preparation of CF clips

With the informed consent of all parties, and following procedures approved by the governing Institutional Review Boards, patients were recruited from primary care medical practices. Digital videotape recordings were made of the patients’ visits with a nurse and then with a physician, with the audio track recorded using an AudioTechnica Lavalier microphone. After the visit, patients filled out a detailed questionnaire of their satisfaction with the medical care they received from the nurse and from the physician, and each nurse filled out a two-item questionnaire regarding his or her satisfaction and comfort with the visit. The entire nurse–patient interaction (which was, on average, 4.6 min) was content-filtered, and the first 5 min of the doctor–patient interaction (which was, on average, 13.8 min) was content-filtered. Past research has demonstrated the importance of the first 5 min of an interview in providing interpersonal information (Pittenger et al. 1960). Each provider–patient interaction was edited to create two sets of audio clips, separating out the voices of the provider (physician or nurse) and the patient prior to being content-filtered using Adobe Audition 1.5. The general procedures for content-filtering and the methodology for rating global affect in Study 2 were similar to that in Study 1. The same cutoffs of 250–445 Hz used in Study 1 were used here. There were four types of CF clips: nurse talking to the patient (CF nurse to patient), patient talking to the nurse (CF patient to nurse), physician talking to the patient (CF physician to patient), patient talking to the physician (CF patient to physician).

Content-filtered Voice Ratings

The audio recordings were content-filtered, and rated on a variety of primarily bipolar scales with adjectives of opposite meaning anchoring the ends of a 6-point Likert-type scale. The items chosen had been validated in previous studies of provider–patient interactions (Charon et al. 1994; DiMatteo et al. 2003; Roter et al. 1997) and represented dimensions of positivity, dominance, and activity (Osgood et al. 1957). Four female raters completed ratings of all CF clips (269 nurse to patient, 269 patient to nurse, 272 physician to patient, and 272 patient to physician). These were identified with a unique number representing site, provider, and patient, and copied onto separate compact discs with a randomized and counterbalanced presentation order (using a table of random numbers) for the CF clips of physician, nurse, and patient voice tone. Each rater was given a different starting place in the random order to reduce potential biases and practice effects (e.g., rater 2 started ¼ of the way through the order and when reaching the end continued from the beginning until she reached her starting place). Raters were instructed to listen to the entire CF voice clip and then rate the speaker on the rating scales according to their global perceptions of the voice tone.

Outcome Measures

A patient post-visit questionnaire was administered to all patients after their medical visit, asking about their satisfaction with the nurse and with the physician using a modified version of the PSQ 18 developed in the Medical Outcomes Study (Marshall and Hays 1994). These items assessed satisfaction with care from the nurse and the physician in terms of technical quality, interpersonal style, communication, participation, and time spent, as well as patients’ general satisfaction with the medical care received. Five composite patient satisfaction subscales were created (based on principal components analysis with varimax orthogonal rotation which accounted for 63% of the variance in satisfaction ratings). These were: overall satisfaction with care (1 item), patient satisfaction with physician competence (6 items, Cronbach’s alpha = .65), physician personal manner (5 items, Cronbach’s alpha = .76); nurse competence (6 items, Cronbach’s alpha = .83) and nurse personal manner (6 items, Cronbach’s alpha = .85). The computed subscales are based at the provider level (nurses: N = 81, physicians N = 61).

The interviewing skills of the physicians in this sample were assessed by four separate raters from non-content filtered, full audio recordings of these physician–patient interactions, allowing further validation of these CF voice ratings of the physicians. These four independent judges rated three dimensions of physician verbal interviewing skills (doctor–patient relationship, medical topic discussion and questioning, and discussion of psychosocial and personal habits) using a 6-point scale (1 = poor; 6 = excellent). Their interrater reliabilities ranged from .37 to .73. These ratings provided a third party assessment of the physician’s interviewing skill in interaction with the patient, and correlations were computed between these ratings and the physician’s CF voice tone.

Interrater Reliabilities, Principal Components Analyses, and Scale Reliabilities of Global Affect Ratings

We computed Cronbach’s alpha to assess the interrater reliability across 4 raters (as if items) for each of the 24 adjective rating items. The interrater reliabilities for the 24 individual rating items averaged .55 (SD = .18, Range: .15–.80) for nurse CF voice, and .33 (SD = .16, Range: .03–.70) for physician CF voice. Thus, the alpha coefficient for each item represents its interrater reliability (agreement among 4 raters). The mean ratings (averaged across 4 raters) for each of 24 items were then subjected to principal components analysis with varimax orthogonal rotation (separately for each of the 61 physicians and for each of the 81 nurses). This yielded three components each, which accounted for 78.8% (nurse) and 85.8% (physician) of the variance. Three composite scales (not individual rating items) were used in all analyses; they were created by calculating the mean of the items falling into each component and are as follows (three composite variables for nurse voice tone, and three for physician voice tone with their number of items and Cronbach’s alpha scale reliability): Caring Interest (11 items: caring, comfortable, engaged, enthusiastic, friendly, likeable, likes the patient, personal, sensitive, sympathetic, warm; nurse alpha = .96, physician alpha = .94), Professional Manner (8 items: active, assertive, competent, dominant, efficient, interested, professional, respectful; nurse alpha = .75, physician alpha = .82), and Negative Temperament (5 items: angry, condescending, hurried, nervous, uncooperative; nurse alpha = .80, physician alpha = .64).

The 24 rating items for the CF voice of patient-to-nurse and patient-to-physician were analyzed similarly, using principal components analysis with varimax orthogonal rotation. Four factors accounted for 79.3% and 84.4% of the variance, respectively; four composite scores for patient talking to the nurse (“patient to nurse”) and four for patient talking to the physician (“patient to physician”) were created based on these components. The number of items in each, and the Cronbach’s alpha scale reliabilities (four for patient to nurse, and four for patient to physician) are as follows: Positive Toward Health Professional (5 items: friendly, likeable, likes the health professional, personal, warm; patient to nurse alpha = .93, patient to physician alpha = .96), Involved in Care (4 items: active, enthusiastic, engaged, interested; patient to nurse alpha = .90, patient to physician alpha = .86), Respectful (9 items: caring, cooperative, not angry, not condescending, respectful, sensitive, submissive, sympathetic, unhurried; patient to nurse alpha = .80, patient to physician alpha = .92), and Confident (6 items: assertive, competent, comfortable, efficient, not nervous, professional; patient to nurse alpha = .90, patient to physician alpha = .92).

Results

Study 2 expanded upon Study 1 by examining ratings of the CF voice of physicians, nurses, and patients in their dyadic interaction using 24 global rating scales. This study set out to address research questions regarding the following: (2a) the relationship between voice tone (of nurse, physician, patient) and satisfaction (of patient and nurse), seeking to predict the speaker’s own satisfaction and that of the “other” in the dyadic interchange; (2b) the degree of similarity (reciprocity) between the voice tones of patients and their health professionals in their interactions with each other. All statistics were computed with provider as the unit of analysis in random effects analyses, allowing generalization to other physicians and nurses as well as to other patients. Carrying out these analyses at the level of means (N = 81 nurses, N = 61 physicians) employs the stringent, robust, and highly generalizable random effects model rather than a fixed effects model (with N = 272 physician–patient interactions and N = 269 nurse–patient interactions), which would allow generalization of findings only to the patients of the providers included in this study.

Correlational Analyses. Research Question 2a

Table 2 presents the relationship between composite ratings of provider tone of voice and patient satisfaction with both physician and nurse. The composite Nurse Caring Interest was positively and significantly correlated with patient satisfaction with the Nurse’s Personal Manner and Competence. The composite Physician Caring Interest was negatively correlated with patient satisfaction with Physician Competence, suggesting that less caring physician voices were perceived as more competent. The CF voice composite Physician Professional Manner was correlated with patients’ overall satisfaction. The CF voice composite Nurse Negative Temperament was negatively correlated with patient satisfaction with the nurse’s Personal Manner.

Table 2 Correlations between providers’ content-filtered speech ratings and patient satisfaction (study 2)

Table 3 demonstrates that composite ratings of patient voice tone predict nurses’ satisfaction and comfort with the visit. In particular, nurses expressed greater satisfaction and comfort with visits in which patients’ CF voice tone was rated as more Involved in Care.

Table 3 Correlations between patients’ content-filtered speech ratings and nurses’ satisfaction (study 2)

Table 4 addresses the research question of the extent to which composite ratings of patients’ CF voice tone are correlated with their documented feelings toward their providers. Specifically, this table demonstrates that patients’ voice tone in interaction with their nurses and physicians was associated with their post-visit questionnaire rated satisfaction with their providers. This table shows that patients’ satisfaction with nurses’ Personal Manner correlated positively with two dimensions of their CF voice tone: Patient Involved in Care and Confident. Patients’ satisfaction with physicians’ Personal Manner and Competence correlated positively with patients’ Positive tone of voice toward the physician, suggesting that patients’ satisfaction with their physician was expressed through positivity in their tone of voice.

Table 4 Correlations between patients’ content-filtered speech ratings and patient satisfaction (study 2)

Research Question 2b

Table 5 shows the correlations of composite ratings of patient voice tone with composite ratings of nurse and physician voice tone, addressing the research question of whether individuals express similarity in their CF voice tone while communicating with each other in the dyadic medical interaction. Indeed, nurses’ Caring Interest and Professional Manner were positively correlated with the degree to which patients were rated as Positive toward the Nurse, Involved in Care, Respectful, and Confident. These correlations reflect similarity in patients’ and nurses’ CF voice tones. Similarly, Negative Temperament in Nurses’ CF voice correlated negatively with ratings of patients as Positive, Respectful, and Confident toward the nurse. Somewhat fewer correlations were significant in the physician–patient interaction. The Professional Manner of physicians’ CF voice correlated positively with Patients’ Confident CF voice tone. The Negative Temperament in physicians’ CF voice was negatively correlated with Patient CF voice ratings of Positivity, Involvement in Care, and Respectfulness.

Table 5 Correlations between patients’ and providers’ content-filtered speech ratings (study 2)

There was a significant relationship between physicians’ CF voice tone ratings of Caring Interest, Professional Manner, and Negative Temperament and physician interviewing skills involving the doctor–patient relationship (See Table 6). In addition, ratings of the physician’s medical topics discussion and questioning correlated positively with independent ratings of the physician’s CF voice Professional Manner. When patients talked with their physicians, they used CF voice tones that were rated as more Positive and Respectful toward physicians who were independently rated as having better interviewing skills (on all three measured dimensions). In addition, patients’ CF voices were rated as more Involved in Care and Confident when they were talking to physicians who received better independent ratings of their interviewing skills involving the doctor–patient relationship.

Table 6 Correlations between physicians’ and patients’ content-filtered speech ratings and ratings of physician interviewing skills (study 2)

Study 2 Discussion

Study 2 examined nurses’ and patients’ content-filtered voice tone in relation to satisfaction with the medical interaction (in addition to physicians’ voice tone). This study also focused on exploring the reciprocity in voice tone between health care providers and patients. The findings were generally as predicted but one of the more surprising findings of this study was that physicians’ caring interested voice tone was negatively correlated with patient satisfaction with physician competence. While this finding might have occurred by chance, it warrants several possible interpretations. It is possible that physicians’ age or gender contributed to this result; future research should compare these correlations for male and female physicians. This finding is similar to the work of Hall et al. (1981) who found that when physicians’ voice tone was more negative their patients were more contented. They found a discrepancy in channels, however, such that more positive words were associated with greater contentment. Although this paper does not examine verbal communication, it is possible that a similar discrepancy in words and tone may be at play here as well. It is possible that patients are more satisfied when words demonstrate more warmth and voice tone is colder.

General Discussion

The present study extends research on content-filtered speech with a detailed examination of voice tone in physician–patient and nurse–patient interactions. This work involves further refinement of content-filtering methods using recent digital technologies, and it addresses the utility, reliability, and validity of a variety of affective rating procedures. The findings of two studies demonstrate that the vocal tone of physicians has real-world validity in its correlations with patient satisfaction and (in Study 1) patient adherence. These findings lend additional support to previous research on the role of physicians’ nonverbal communication, particularly voice tone, in the physician–patient relationship (Ambady et al. 2002; DiMatteo et al. 1986; Hall et al. 1981). The present research demonstrates that the nonverbal behavior of one of the interactants in a dyadic interaction correlates significantly with the verbal and nonverbal behavior of the other interactant, supporting past research (Griffith et al. 2003).

Study 2 presented here is the first to quantify the affective nonverbal behavior of nurses, demonstrating the important relationship between nurses’ tone of voice and patients’ satisfaction with their care. This research examines, for the first time, provider voice tone in relation to patients’ perceptions of time management (feeling hurried or rushed) in the medical visit, as well as in relation to patients’ perceptions of choice and control in medical decisions (Ballard-Reisch 1990; Kaplan et al. 1995; Martin et al. 2003). It is particularly interesting that patient-reported medication adherence was strongly related to enthusiastic physician tone of voice given recent research developments showing the central role played by physicians’ manner in promoting patient adherence to treatment (Frankel 1995; Friedman 1982; Martin et al. 2001; O’Malley et al. 2002; Safran et al. 1998). Overall, these findings point to the importance of health professionals’ vocal tone in their care of patients and contribute to growing evidence that affective communication may be essential to good health care (DiMatteo et al. 1994; Ong et al. 1995; Roter and Hall 1992).

The results of the present research indicated that although the global CF ratings had low to moderate reliability, their validity in terms of correlations with outcome measures was consistently high. In accordance with Guilford’s equation for validity (1954, p. 407), considerable empirical evidence suggests that the reliability of a measure is not the upper limit of its validity (Rosenthal 1966, 2005). Specifically, the component of Guilford’s validity equation that has the strongest effect on validity is the number of raters not their agreement with one another. The validity coefficients of global affect ratings often exceed their reliability. In the case of the global affect ratings of CF voice here, low interrater reliability might have resulted from raters perceiving different aspects of the criterion voice tone variable (e.g., Caring Interest), all of which are relevant to the validity outcome measures and which taken together result in substantial validity. Further, while the individual item reliabilities may be somewhat low, the relevant scale reliabilities and validity coefficients of the composites they make up are quite robust and significant. Conceptually, it should be noted that limitations in agreement among raters may reflect variations in agreement among patients as well; future research should focus on understanding characteristics of the target voice, the rater, and their interaction that contribute to this variation. The present research utilizes both unipolar and bipolar measurement strategies to assess multiple dimensions of global affect that are perceived in voice tone. The findings indicate that content-filtered voice can be reliably rated and with principal components analysis as a guide, grouped into reliable composites.

These studies also suggest that patients and their health professionals reflect each others’ emotional experience of satisfaction in their tone of voice. In particular, positive relationships were found between warmer, more positive voice tone and satisfaction. Further, physicians’ voice tones were predicted by their patients’ health status. Study 1 physicians were rated as having more hostile voices with patients who had poorer physical health, more limitations, and more pain. Physicians had less warm and less interested voices talking with patients who were limited by pain, but warmer, and more interested voices with their patients who were nervous. These findings support previous research demonstrating that physicians’ expressions of affect toward patients are predicted by their patients’ physical and mental health status (Hall et al. 1996).

The present research supports the hypothesis that health professionals and their patients reciprocate emotional messages in their voice tone. Roter et al. (2006) called for future research on nonverbal communication, particularly in medical visits, to focus on “the interactional dynamics of emotional perception, expression, and reciprocity” (p. S32). The present work toward that end has found that patients spoke with a warmer and more engaged tone when their providers spoke back in a similar way. Further, the more negative tone there was in the providers’ voice, the less positivity, involvement, respect, and confidence was evident in the patient’s. Interestingly, there was more reciprocity in interactions with nurses. One reason for this may be that there is less of a power differential (Hall et al. 1995) in the nurse–patient relationship than in the physician–patient relationship. Patients may feel more comfortable with nurses as they may have similar levels of education, for example. There also may potentially be a gender difference as the majority of nurses in this study are women, and it is possible that females display more reciprocal patterns of voice tone.

These data are, of course, correlational and the direction of the influence cannot be ascertained. It is possible that patient satisfaction influences provider affect (encoded in voice tone) or that third variables such as the demographic characteristics (e.g., age, gender, ethnicity) of physician and patient, and the socioeconomic status of the patient influence vocal affect, satisfaction, and health status. It is possible that the provider or patient may set the “affective tone” of the interaction with his or her voice tone in the opening of the visit, locking the other into a vocal pattern of exchange throughout the remaining time. There are other important limitations in this research, as well, most notably the absence of outcomes of care, such as improvements in health or symptom status. Other limitations include the possibility that providers and patients changed their behavior as a result of being audiotaped, although past research has found this is not a significant problem. Finally, the patients who participated in this study were volunteers, which could have increased the chance of bias in their behavior because certain types of patients may have been more likely to participate.

Future research questions to be explored with these same data sets include the following: (1) the effects of physician communication training on their content-filtered voice tone (before and after training) and on the relationship of voice tone change to outcomes (Fallowfield et al. 2003); (2) the relationship of content-filtered voice of the physician to the ethnicity and social class of the patient and how content-filtered voice may vary by ethnicity of the physician (Coats and Feldman 1996; Marsh et al. 2003); (3) the relationship between content-filtered voice and physicians’ stress, burnout, and attitudes about patient care (Deckard et al. 1994; Visser et al. 2003); (4) the relationship between verbal communication (e.g., transcripts of the interaction) and the affect in CF speech, particularly the presence and correlates of verbal-nonverbal cue discrepancies (Bugental et al. 1970; Hall and Levin 1980); (5) the effects of sender and receiver gender in the perception and effects of voice tone messages (Hall et al. 1994); and (6) the relationship between aspects of content-filtered voice and physician and patient age. The current study provides a bridge between past content-filtered speech research and examination of the provocative research questions described above.

This study may have potential implications for the training of medical students and physicians in decoding and encoding of nonverbal behavior, particularly voice tone. It is important for health care providers to be aware of the power of voice tone and consider how their emotions may be inadvertently leaked through their voice tone. Indeed, voice tone is but one type of nonverbal communication; ideally, measurement of physician–patient communication should include assessment of many types of both nonverbal and verbal communication in order to provide the most complete picture of what behaviors are most important in prediction of patient outcomes.