Keywords

Criminal justice proceedings are high-stakes settings in which native English speakers have difficulty negotiating the legal process, let alone persons with no or limited English proficiency. Increasingly, law enforcement interviewers are required to rely on interpreters (Mulayim & Lai, 2017; Shaffer & Evans, 2018). However, the mere presence of an interpreter does not guarantee accurate interpreting. If the interpretation is inaccurate, evidence can be misconstrued, affecting assessments of witness veracity and credibility. This can compromise the right of the parties to a fair trial and lead to wrongful convictions or acquittals, costly appeals, and retrials. The scale of the problem has been recognized by members of the judiciary, who for many years have complained that the poor quality of interpreters is detrimental to the court’s ability to perform its duties (Hale, 2011). In response to calls for improvement, the European Parliament, the Council of Europe (1950), and comparable bodies in other jurisdictions mandated that interpreters be fully competent for the task assigned (Hertog, 2015). The assigned tasks include interpreting in the investigative phases of the criminal justice process, including suspect and witness interrogations and intelligence interviews. In different jurisdictions, the terminology applied to law enforcement interviews varies. In Australia and the United Kingdom (UK), the term “interview” is preferred; in the United States of America (USA) and Canada, “interrogation” is the more common term.

In Australia, Police Standing Orders require police forces to hire professional interpreters for all interviews with non-English speakers (Ozolins, 2009). In the USA, regulation of interpreting practices in law enforcement settings varies by state. In Europe and the UK, Directive 2010/64/EU established the right to quality interpretation and translation in all stages of criminal proceedings, “…from the time that they are made aware by the competent authorities of a Member State…to the resolution of any appeal” (ImPLI Project, 2012, p. 5). Noncompliance “…can lead to the invalidation of investigations and pre-trial proceedings, while poor quality interpreting may lead to a violation of the principle of fairness (European Convention on Human Rights) or to challenges in court that may lead judges to declare the pre-trial proceedings inadmissible” (ImPLI Project, 2012, p. 7). In the UK, the interpreter must attend the police interview of a non-English-speaking suspect in person (Home Office, 2017). Despite rapid globalization of interpreting standards, often most rigorous for court interpreters (Hlavac, 2013), legal interpreting in non-court settings, such as police interviews, often falls outside of the scope of these regulations.

Community interpreting refers to interpreting conducted in domestic settings (Hale, 2007a), such as police stations, courts, hospitals, and other public services. Legal interpreting is a subfield that encompasses interpreting in diverse law enforcement settings, such as asylum and immigration proceedings, courtrooms, tribunals, police, prison, and military settings (Hertog, 2015). A review of the literature on legal interpreting (Monteoliva-Garcia, 2018) documented a total of 464 legal interpreting publications (books, conference proceedings, journal articles, book chapters, monographs, and doctoral theses) in the ten-year period spanning 2008–2017, of which more than 300 were journal articles. Legal interpreting research using a variety of qualitative and quantitative methods has largely focused on courtroom settings. By comparison, studies of interpreted police interviews lagged. The authors identified “police interpreting” in law enforcement interviews as an area of emerging interest (Monteoliva-Garcia, 2018).

Evidence-based policing is rapidly becoming the global standard in contemporary policing practice (Knutsson & Tompson, 2017; Lum & Koper, 2017; Mitchell, 2019), exemplified by “the use of best research evidence on ‘what works’ as a guide to police decisions” (Sherman, 2013, p. 383). The development of a body of specialized knowledge on effective communication skills to gather intelligence and oral evidence from suspects, sources, and witnesses is a prime example (Meissner, Surmon-Bohr, Oleszkiewizc, & Alison, 2017). Despite a high and increasing proportion of interpreted investigative interviews, research on this topic is in its infancy.

One recent review of research on interpreted investigative interviews with suspects, victims, witnesses, and human intelligence sources concluded that “Emerging research findings appear to indicate that there is little agreement or understanding between (and within) groups of investigators and interpreters about what is effective in practice” (Evans, Shaffer, & Walsh, 2020, p. 141). As a result, practitioners and policy-makers might receive diametrically opposing advice from different legal psychologists, and there is no resource to consult to account for the discrepancies.

Examples of disparities in research outcomes included the effective placement of an interpreter in an interview, the effect of an interpreter on interviewer–interviewee rapport, and the extent of information loss in interpreted versus monolingual interviews (Evans et al., 2020). Additionally, some studies across multiple languages and countries including the UK, Russia, Republic of South Korea, and the USA (Ewens, Vrij, Leal, et al., 2016a, 2016b; Ewens, Vrij, Mann, & Leal, 2016; Vrij & Leal, 2020) concluded that ad hoc bilingual speakers perform just as well as interpreters, while other studies found significant differences in the performance of these two groups, with interpreters performing significantly better than ad hoc bilinguals (Berk-Seligson, 2009; Lai & Mulayim, 2014; Mellinger & Hanson, 2019; Mulayim & Lai, 2017; Pöchhacker, 2004), in particular in Australia, where there are highly trained and qualified interpreters (Hale, Goodman-Delahunty, & Martschuk, 2018; Liu & Hale, 2018). Such a stark difference might be due to differences in the definition of an “interpreter,” with some researchers referring simply to the fact that the interpreters are paid, others referring to professional interpreting training, and others referring to certified interpreting practitioners. Accordingly, this is a timely opportunity to explore factors that could account for disparities observed in research outcomes, to make recommendations about measurement approaches that are most viable, and to identify issues in interpreted investigative interviews warranting further research.

Overview of the Chapter

The central goal of this review of the literature is to advance growth in practice and policy by fostering development of a robust and coherent scientific evidence base on interpreted investigative interviews. The aims of this chapter are to synthesize and integrate informative findings on psycho-legal issues central to contemporary police interview practice and to identify gaps and issues unaddressed in prior studies.

This chapter is divided into four parts. First, by way of background, we identify an overarching model of multimodal communication that specifies three core components of the interpreting task in an investigative interview and interviewing strategies commonly applied by investigative interviewers. We also outline the training and qualifications of interpreters proficient in legal interpreting, the two main interpreting modes in which they are trained, and the interpreter’s role. Second, we describe the major types of field studies and laboratory experiments applied in research on interpreted police interviews, and strengths and weaknesses of these approaches. In part three, we critically evaluate research on six key topics that affect interpreted investigative interviews conducted with persons who are not proficient in English, and identify gaps in the research. In the Conclusion, we consider steps to develop a more robust evidence base to guide policy and practice in interpreted police interviews and discuss implications of the findings for other contexts, namely lawyer–client interviews, and training for interpreters and interviewing professionals.

Multimodal Communication in Interpreted Police Interviews

In this section, we introduce an interactive communication model and key components of the interpreting task in a police interview. Next, we describe common interviewing strategies used by police practitioners, which go beyond the propositional content of the questions. We conclude by discussing the two main modes of interpreting in which interpreters are trained (i.e., consecutive and simultaneous), the role of interpreters in police interviews, certification procedures for interpreters, and specialized training in legal interpreting. Together, these topics provide a backdrop to understand research conducted to date on interpreted police interviews.

The Interaction Process Model of Communication in Police Interviews

Communication in police interviews consists of an interaction between the interviewer and the suspect or witness, as posited in the cognitive-behavioral Interaction Process Model (Moston, Stephenson, & Williamson, 1992; Madon, More, & Ritchfield, 2019). When a police interview is attended by an interpreter, the interaction becomes tripartite (Nakane, 2014; Houston, Russano, & Ricks, 2017). “In a monolingual police interview the police officer and/or the other participants are able to engage in direct negotiation of participation and meaning themselves, but in interpreter-mediated police interviews the two primary interactants have to depend on the interpreter” (Gallai, 2013, p. 69). The dynamics of an investigative interview inevitably change, as the traditional “oppositional dyad” of interviewer-suspect is transformed by the presence of an interpreter “into a triadic mixture of opposition, cooperation and shifting alignments” (Russell, 2004, p. 116). The impact of the presence of an interpreter on the interaction dynamics and the power relationships is still being investigated (Nakane, 2014).

As in other types of oral interactions, communication in an investigative interview is multimodal (Conley, O’Barr, & Riner, 2019) and typically combines three information sources: (a) linguistic or verbal (i.e., words); (b) paraverbal or vocal (e.g., tone of voice, intonation); and (c) nonverbal or visual (e.g., gestures, facial expressions, body language). The way in which an utterance is expressed portrays meaning and elicits judgment from others. Put simply, oral communication entails more than propositional information alone; it also conveys attitudes and emotions. Some police practitioners report greater reliance on paraverbal communication and nonverbal gestures in order to build rapport with interviewees in interpreted interviews (Goodman-Delahunty & Howes, 2017).

Pioneering research by experimental social psychologist Mehrabian (1972, 1981) explored the effects of incongruity between the three sources of communication (Mehrabian & Wiener, 1967), especially when emotion was important (such as to determine liking by the speaker of the addressee), to assess which was the most influential (Mehrabian & Ferris, 1967). In these experiments, where one-word responses were compared with nonverbal and paraverbal communication, via style, expression, tone, pitch, facial expression, and physical gestures, the nonverbal and paraverbal communication accounted for as much as 93% of the meaning inferred by the participants. This seminal research illustrated (a) the importance of factors in addition to words to convey meaning or interpret meaning, such as the vocal elements conveyed by the pragmatic force of the speech, as well as visual elements conveyed by the facial expressions, movements, and gestures of the speaker; and (b) that in the absence of visual cues and signs, such as when communicating by telephone, the potential for confusion and error increased. Attention to one particular source (verbal, visual, or vocal) might be more informative when the others are less so, as these information sources are complementary. The extent to which interpreters in police interviews replicate all three sources of information has not been thoroughly researched.

Usually, taking all three sources (verbal, visual, and vocal) into account when communicating face-to-face increases communication effectiveness and accuracy. For example, nonverbal and paraverbal behaviors that indicate whose turn it is to speak include eye-contact, gaze withdrawal, interruptions, backchannel responding, linguistic hedges, pauses, and gestures (Mason, 2012). Among the nonverbal turn-taking behaviors, gaze is particularly important for signaling attention and regulating participation in conversation (Mason, 2012). Conversely, reliance on a single source, such as auditory communication only, as occurs in telephonic communications, decreases accuracy and effectiveness in monolingual interactions. For example, face-to-face monolingual requests secured 34 times as much compliance as the same request via e-mail (Roghanizad & Bohns, 2017), a difference attributable to the presence of nonverbal cues in the face-to-face condition. Recently, in legal settings, in line with the multimodal model of communication, linguists have taken nonverbal communication (e.g., gesture) and spatial and visual relations among the participants, into account (Conley et al., 2019). Accordingly, research attention to all three sources of information communicated in interpreted investigative interviews is vital (for a discussion of the complexities of the interpreting task, see Hale, 2007b, 2010).

Interpreters need to fully understand both the source language message and the questioning strategies used by law enforcement personnel before they can attempt to accurately interpret into the target language. For example, a police investigator might want to allow for silence as a tactic to encourage a suspect to talk. Interrupting that silence would be counter-productive. The need for interpreters to be briefed ahead of time about such strategies was highlighted by Russano, Narchet, and Kleinman (2014). Some common types of questioning strategies used in police interviews are described next.

Interpreting Police Interviewing Strategies

Skilled police interviewers apply a range of strategies to elicit information and detect deception. Several reviews of contemporary investigative interviewing strategies are available to acquaint interpreters with these strategies (e.g., Gunderson & ten Brinke, 2019; Hope & Gabbert, 2019; Kebell & Davies, 2006; Madon et al., 2019; Meissner et al., 2017). Chief among these are rapport-building strategies that draw on principles of cognitive and social psychology to secure cooperation and elicit meaningful information (Meissner et al., 2017). Other commonly used questioning strategies are the Cognitive Interview (Memon, Meissner, & Fraser, 2010) to enhance recall, and strategies applied with uncooperative suspects and witnesses, such as Conversation Management (Shepherd & Griffiths, 2013). Notably, several aspects of rapport in police interviews vary by culture, including turn taking, eye contact, back-channel responses, and behavioral and verbal mirroring (Dhami, Goodman-Delahunty, & Desai, 2017; Richardson, McCulloch, Taylor, & Wall, 2019).

In linguistic terms, replication of interviewing strategies comes within discourse-analytical research on the handling by interpreters of verbal, paraverbal, and nonverbal discourse markers (Monteoliva-Garcia, 2018). The extent to which interpreters are aware of common investigative interviewing strategies and replicate them in their interpretation has emerged as an important international research topic (Rombouts, 2004, 2011). For example, interviewers perceived that interpreters’ unfamiliarity with best practice strategies used to interview child complainants of sexual abuse impaired the effectiveness of these interviews (Powell, Manger, Dion, & Sharman, 2017).

Like child witness interviews, central features of the Cognitive Interview and of deception detection strategies are rapport-building and open-ended prompts to interviewees to elicit free recall narratives (Memon et al., 2010; Nahari et al., 2019). Elicitation of a free-form narrative from the witness or suspect might require interpreters to deviate from the usual turn-taking exchanges in an interpreted interview when the interpreter uses the consecutive interpreting mode (Heydon & Lai, 2013). In consecutive interpreting, interpreters have a central role in the management of turn-taking (i.e., deciding on who speaks and when). This feature of communication is closely related to the coordination component of rapport (Tickle-Degnen & Rosenthal, 1990). Interpreters who are unaware of interviewer reliance on free-recall narrative strategies and rapport-building strategies and who are less skilled in the management of turn-taking might be more error-prone than their counterparts who have specialized training in legal interpreting (Gallai, 2017; Mulayim, Lai, & Norma, 2014).

Two dominant modes of interpretation are taught and practiced, namely consecutive and simultaneous interpreting (Hale, Martschuk, Ozolins, & Stern, 2017). During their training, interpreters learn to interpret in both modes, but generally specialize in one mode or the other (Hale, Goodman-Delahunty, Martschuk, and Doherty, 2020). The consecutive and simultaneous interpreting modes are described next.

Consecutive and Simultaneous Modes of Interpreting in Legal Settings

Interpreting in legal settings is conducted either in the consecutive mode, with short, dialogic or long, monologic turns between speakers, or concurrently, in the simultaneous mode. In short consecutive interpreting, used for interactions between speakers, the interpreted units of speech last a few seconds, such as a single word, to a few sentences at a time, up to about 50 words (Andres, 2015; Viezzi, 2013). Short consecutive interpreting “is typical of face-to-face encounters where the form of communication is conversation,” whereas long consecutive interpreting “is typical of events where communication takes the form of one-to-many utterances of varying length with no mutual interaction between speaker and listeners” (Viezzi, 2013, p. 377). Thus, long, consecutive interpreting is most typically reserved for prepared monologues, in which the speech unit may last 10–15 or 20 min (Viezzi, 2013). For short and long consecutive interpreting, the interpreter works alone and is usually placed next to the witness, hears segments of speech in the source language, takes notes, and delivers the interpretation in the target language, without the aid of technological equipment.

In the simultaneous mode, speech in the source language is heard by the interpreter through headphones, and the interpretation is delivered concurrently, at almost the same time, in the target language, via a microphone (Stern, 2012). The average delay between speech and simultaneous interpreting is three seconds (Seeber, 2011); however, it could vary between different language pairs and the direction of the interpretation.

Customary use of these two distinct modes of interpreting in different contexts appears to have evolved by happenstance rather than design. Unique historical factors spurred the use of simultaneous interpreting in European legal proceedings. In the 1940s, a Rockefeller-IBM funded Department of Interpreting Studies at the University of Geneva exposed students to new technology that enabled them to interpret simultaneously by listening to the speaker via headphones and interpreting into the target language via a microphone. When the Nuremburg trials commenced shortly after the end of World War II, the courts in neighboring Germany hired three graduates of this simultaneous interpreting program as interpreters for the trials (Gaiba, 1998). Thereafter, courts in Europe continued to use this mode of interpreting in legal proceedings and extended its use to international conferences (Pöchhacker, 2011). Today, in European international courts, simultaneous interpreting is the default mode (Stern, 2012). By comparison, in other jurisdictions, such as Australia, the UK, and the USA, consecutive interpreting is the default mode used in legal settings, including police interviews. However, with the aid of new portable headset equipment, recently some interpreters working in domestic courts in the USA have implemented the simultaneous mode (Mikkelson, 2010).

Both consecutive and simultaneous modes of interpreting have advantages and disadvantages. Decisions on which interpreting mode to use in legal settings have been based almost exclusively on tradition and cost rather than an evidence-based analysis of their respective effectiveness. In police interviews, consecutive interpreting is most typical (Lai & Mulayim, 2014). However, police practitioners who have conducted interviews in the consecutive mode reported that it made the interview less “free flowing” and “more like a structured interview” (Goodman-Delahunty & Howes, 2017). Other disadvantages of the consecutive versus the simultaneous mode emerged in a field experiment on legal interpreting in court: Mock jurors reported that the simultaneous mode was less distracting, and that they understood and remembered case facts more accurately than their counterparts who attended the same trial interpreted consecutively (Hale et al., 2017). The longer duration and accompanying costs of consecutive interpreting are further disadvantages (Hale, Goodman-Delahunty, Martschuk, and Doherty, 2020).

The Role of Interpreters in Police Interviews

Some police interviewers, lawyers, and judges misunderstand the interpreting process and erroneously endorse the view that an interpreter translates verbatim, akin to a disembodied machine (Evans et al., 2020; Fowler, 2013). A description of the interpreting role as “communication facilitation” is more accurate (Laster & Taylor, 1994), as interpreters are trained to attain pragmatic equivalence, not literal verbatim renditions.

Debates about the nature and scope of the role of professional interpreters in legal settings are long-standing (Devaux, 2018; Hale, 2008; Monteoliva-Garcia, 2018). Among academics, some factions contend that interpreters fulfil an advocacy role, others contend that they serve as gatekeepers, and others contend that interpreters are independent professionals (Hale, 2008, analyzed the different roles attributed to interpreters).

Professional interpreters are expected to adhere to their ethical obligation to interpret everything faithfully and to override their personal opinions (Howes, 2018; Mulayim & Lai, 2017). In confrontational interviews, however, interpreters might inadvertently neutralize, euphemize, or tone down the original speech (Felberg & Šarić, 2017; Taibi & El-Madkouri Maataoui, 2016) as a natural human reaction to conflict and a way of aiding communication. However, it is important that interpreters keep in mind that they “are not responsible for what clients say” (Australian Institute for Interpreters and Translators [AUSIT], 2012, p. 9). If for any reason it is not possible to adhere to their ethical requirements, AUSIT advises interpreters to withdraw from the assignment.

In order for interpreters to understand the source message, they need to understand its cultural context. Very often, cross-cultural differences inherent in language, known as pragmalinguistic differences (Thomas, 1983), are reflected in the way concepts are expressed. A skilled interpreter will bridge such gaps at a pragmatic/discourse level of speech by producing what is known as a pragmatic rendition (Hale, 2007a, 2013a). For example, politeness is expressed differently in different languages. Some languages, such as English, use indirectness to express politeness (e.g., would you like to close the door, please?), whereas others, such as Russian, use directness combined with formal expressions (e.g., please close the door). Trained interpreters are taught to match the level of politeness (or the pragmatic level of language) rather than to match the individual words or structures.

Other cross-cultural differences are related to social conventions, known as sociopragmatic differences (Thomas, 1983), such as issues of proximity, gaze or greetings, or issues that relate to common practices. These cannot be addressed in an accurate pragmatic interpretation and might require an additional intervention from the interpreter to alert parties to a potential cross-cultural misunderstanding. The extent to which interpreters can take on the extra role of cultural broker has been hotly debated among interpreting scholars (Barsky, 1996; Felberg & Skaaden, 2012; Kelly, 2000). The central issue is the extent to which an interpreter should alert participants to potential cross-cultural misunderstandings in legal settings (Hale, 2013a).

There is general consensus that interpreters can point out situations “when a cultural misunderstanding impairs a linguistic exchange” (AUSIT, 2012; Hale, 2013a; ImPLI Project, 2012, p. 44; Judicial Council on Cultural Diversity [JCCD], 2017)—that is, the interpreter is expected to demonstrate intercultural competence and to take action “to prevent misunderstandings by explaining culture-bound reactions of interviewees” (ImPLI Project, 2012, p. 44). However, it could be very difficult for interpreters to ascertain whether an observed reaction is due to a cross-cultural difference or other factors. When in doubt, professional interpreters might be reluctant to offer such clarifications, as blaming culture for any misunderstanding can be a dangerous practice (Felberg & Skaaden, 2012).

Moreover, interpreters vary in their cultural competence. For example, some interpreters might be endogroup members (i.e., most closely affiliated and familiar with the culture of the English-speaking majority), and might lack exposure to and not share the culture of the suspect or witness. Other interpreters are exogroup members (i.e., share the native language and culture of the non-English-speaking minority; Taibi & El-Madkouri Maataoui, 2016), but might not share the culture of the investigative interviewer. Thus, employing an interpreter does not eliminate misinterpretations due to differing cultural norms in verbal and nonverbal behaviors (Evans et al., 2020). To date, little empirical research has been conducted on this topic, and attributions to cross-cultural differences lack substantiation (Felberg & Skaaden, 2012; Hale & Liddicoat, 2015).

Many training programs that prepare interpreters to work in legal settings include information about cultural differences and steps they can undertake to address cultural differences that might result in misunderstandings. Next, we describe the types of certification programs that exist for legal interpreting and specialized training programs for legal interpreting, including interpreting in police interviews.

Certification for Legal Interpreting

Interpreters are credentialed in different ways in different parts of the world. The diversity of practices was illustrated in a systematic comparison of interpreter certification procedures in 21 countries, including the USA, UK, Australia, and many European countries (Hlavac, 2013). In some countries, the term “accreditation” is favored, whereas others prefer the term “certification.” Many accreditation/certification and training institutions for interpreters assess five central components of the interpreting task, namely (a) the accuracy of rendition of the propositional content, (b) accuracy of rendition of the manner of delivery, (c) use of correct legal terminology, (d) application of ethical professional protocols, and (e) interactional management skills.

In the USA, interpreters can be certified as court interpreters. The National Center for State Courts oversees the Consortium for Language Access in Courts, which in turn co-ordinates the testing of court interpreters in individual states. Certification is applicable only in some states and in approximately 15 languages (Hlavac, 2013). In the UK, the National Register of Public Service Interpreters sets out a strict Code of Practice for its registered interpreters, to which police are signatories (Fowler, Vaughan, & Wheatcroft, 2016).

In Australia, up until 2017, interpreters were accredited by the National Accreditation Authority for Translators and Interpreters (NAATI) at two levels of accreditation: Paraprofessional or Professional. Interpreters who worked in the legal field were expected to be accredited at the higher level (Professional), although NAATI accreditation was a generalist and not a specialist accreditation. In 2017, in response to a review (Hale et al., 2012), NAATI introduced an improved system of certification with extra layer of specialization. The first level is a Provisional Certification for Interpreters, which has an expiry date by which interpreters are required to upgrade to the general Certification for Interpreters. After fulfilling further training and professional practice, interpreters can then sit for a called Certified Specialist Legal Interpreter. Currently, interpreter certification can only be acquired by sitting for NAATI examinations after having met all relevant requirements, including pretest training (see naati.com.au for certification details). However, available training differs by state and language combination. The highest level of training is master’s degrees, followed by bachelor’s degrees and vocational training (at colleges of Technical and Further Education). Most university programs offer courses in legal interpreting, in which students receive specialized training in police and court interpreting.

Specialized Training in Legal Interpreting

Specialized legal interpreting is crucial to ensure accurate interpreting (Hale, 2019), yet very few countries prescribe any type of pre-service training for interpreters, including interpreters who work in legal settings. Australia is among the few countries offering high-level Community Interpreting training, among which a number of courses specialize in legal interpreting. Examples are the course “Interpreting in Legal Settings” offered at the University of New South Wales and “Legal Interpreting” offered at Western Sydney University (Hale & Gonzalez, 2017). Not all practicing interpreters have received such specialized training. Whether specialist certification such as what NAATI proposes will become a prerequisite for all interpreters working in legal settings will depend on the availability of specialist interpreters in different language combinations and on the value that users of interpreting services assign to such high levels of expertise by remunerating interpreters accordingly.

Synopsis on Communication in Interpreted Police Interviews

The Interaction Process Model of interpersonal communication in police interviews incorporates multimodal features, all of which are important in understanding the meaning and intention of the utterance, and in turn effective interpretation (Conley et al., 2019; Madon et al., 2019; Moston et al., 1992). Police interviewers use a range of specialized questioning strategies which combine verbal, paraverbal, and nonverbal features to elicit meaningful information, secure cooperation, and detect deception (Madon et al., 2019; Meissner et al., 2017). Without a proficient interpreter, key strategies, such as rapport-building and free recall narratives, might not be effectively replicated. Interpreters should be familiar with common contemporary investigative interviewing strategies, and police interviewers should be familiar with the strengths and weaknesses of consecutive and simultaneous interpreting modes, the role of a legal interpreter, and interpreter certification and training for legal interpreting. There is a dearth of research with credentialed interpreters who specialize in legal interpreting to develop best practices in interpreter-mediated police interviews. Next, we review research methods applied to explore issues arising in interpreted police interviews.

Research Approaches to Interpreted Police Interviews

A wide variety of qualitative and quantitative empirical research methods have explored issues arising in interpreted police interviews. We distinguish field research, conducted with real-world cases and real practitioners, from laboratory experiments using simulated interviews and student role-players. First, we describe six types of field research and then describe laboratory experiments. Next, we review the strengths and weakness of the research, commenting in particular on factors affecting the internal, external, and ecological validity of the findings.

Field Research on Interpreted Police Interviews

Field research takes a number of different forms. The most useful are direct observations of primary sources of real-world archival data. Archival research uses electronic or audiovisual records of actual interpreted police interviews, or transcripts of those interviews. Obtaining research access to official police records of interpreted interviews is difficult. Occasionally, legal controversies develop in relation to inadequate interpreting, and excerpts of records of interviews are available in published decisions issued by courts of appeal (Hayes & Hale, 2010). Both qualitative and quantitative analyses can be conducted on field data. Examples are provided of six types of field research on interpreted police interviews: (a) case studies; (b) discourse analyses of interview excerpts; (c) surveys of stakeholders; (d) interviews of stakeholders; (e) live simulated police interviews with interpreting practitioners; and (f) live simulated experimental police interviews with interpreting practitioners.

Case Studies of Interpreted Police Interviews

Fieldwork in the form of retrospective analyses of archival case studies can shed light on a range of issues. One instructive example is an in-depth review of all US appellate cases involving police interpreters and Hispanic suspects, spanning a 34-year period (Berk-Seligson, 2009). Single case studies, such as the analyses by Nakane (2009) of four interpreted police interviews in Katsuno & Ors v.Australia (2006), tend to focus on the severity or extent of observed interpreting errors due to unfamiliarity of the interpreter with the native language of the interviewee; the inability to coordinate and to manage turn taking effectively; departures from the interpreter role, such as expressing personal opinions and initiating independent questions; or gaps and omissions in interpreting. In South Korea, where the quality of interpreting and qualifications of interpreters in police interviews are unregulated, a case study of inept interpreting in a 4-h suspect interview demonstrated extensive errors in the written record of the interview in a homicide case (Lee, 2017). The findings suggested that these errors culminated in the wrongful conviction of a mother for murdering her four-year-old daughter.

Discourse Analysis of Interpreted Police Interviews Conducted in the Field

To provide an empirical understanding of interpreted police interviews, linguists and interpreting scholars tend to conduct qualitative discourse analyses of excerpts of real-world police interviews conducted by professional interpreters. Discourse analysis is a research method for studying language, comprising verbal, paraverbal and nonverbal features, in relation to interactions within a social context. Thus, discourse research is not confined to the literal meanings of language, but considers its social functions—that is, meaning depends upon the context of the interaction (Potter & Wetherell, 1987). Micro-level approaches are detailed systematic analyses of interpreted language used in face-to-face talk, focused on techniques and competencies in successful and unsuccessful interpretation (Shaw & Bailey, 2009). Discourse analytic approaches applied to legal interpreting draw on a wide range of disciplines including anthropology, criminology, cultural studies, gender studies, law, linguistics, social psychology, and sociology.

Discourse analysis applied in police interviews (Licoppe & Veyrier, 2017; Nakane, 2014) has provided insights into best practices and errors. For example, videorecordings of actual interpreted interviews of applicants for asylum, in which the interpreter was located either with the interviewee or remotely, were compared (Licoppe, Verdier, & Veyrier, 2018). Further examples include fine-grained discourse analyses of excerpts of dialogues extracted from transcripts of actual interpreted interviews (Krouglov, 1999; Mizuno, Nakamura, & Kawahara, 2013).

Field Surveys of Stakeholders in Interpreted Police Interviews

In this section, we discuss quantitative surveys conducted in the field with three participant groups of stakeholders: interpreters, police practitioners, and interviewees (i.e., suspects and witnesses).

First, written survey instruments administered to interpreters are helpful in understanding diverse perceptions about the interpreting task in general or a specific interpreting task (e.g., Martschuk, Goodman-Delahunty, & Hale, 2020). For instance, Braun and Taylor (2012b) surveyed 166 legal interpreters in European countries to gather information about their experiences with videolink interpreting in different settings, including police interviews.

Second, surveys of investigative interviewing practitioners have been conducted by teams of psychologists in different countries and jurisdictions, some of which have focused on interpreted police interviews (e.g., Shaffer & Evans, 2018 in the USA; Wakefield et al., 2014 in Australia). Although some research teams administered identical surveys in different jurisdictions (e.g., Miller, Redlich, & Kelly, 2018; Redlich, Kelly, & Miller, 2014; Sivasubramaniam & Goodman-Delahunty, 2019), no rigorous jurisdictional comparisons of the outcomes on interpreting have been undertaken.

Third, no survey studies of interviewees in actual interpreted police interviews were located. Some psychologists have attempted to round out perspectives of the stakeholder triad by surveying role-playing witnesses at the conclusion of laboratory experiments about their perceptions of the interviewer and their experience with the interpreter (e.g., Ewens et al., 2017; Houston et al., 2017).

Field Interviews of Stakeholders in Interpreted Police Interviews

This section addresses field interviews conducted with two groups of stakeholders in the triadic interpreted police interviews, namely interpreters and police practitioners. We discuss interviews conducted in the field with practicing interpreters and investigative interviewing practitioners. A third group of stakeholders—that is, suspects and witnesses—could be studied, but we located no published field interviews of suspects or witnesses who participated in interpreted police interviews.

First, some field research has focused on the experiences and perceptions of samples of practicing interpreters working in police interviews. These studies have often used semi-structured questionnaires to canvass interpreters’ experiences in different jurisdictions, such as the United Kingdom (Wilson & Walsh, 2019), the United States of America (Russano, Narchet, Kleinman, & Meissner, 2014), and Australia (Howes, 2018). For instance, interpreters have been asked about their role in community settings, such as legal settings (Hale, 2007a, 2008; Lee, 2009); their role and placement in human intelligence interviews (Russano, Narchet, Kleinman, & Meissner, 2014); and experiences of distress and secondary trauma (Howes, 2018; Wilson & Walsh, 2019).

Second, parallel studies have been conducted with samples of investigative interviewing practitioners who work with interpreters, using semistructured questionnaires (e.g., Goodman-Delahunty & Howes, 2017; Goodman-Delahunty & Martschuk, 2016; Russano, Narchet, Kleinman, & Meissner, 2014; Wilson & Walsh, 2019). Some research has targeted discrete practitioner groups who specialize in interviewing particular types of suspects or witnesses, such as children (Powell et al., 2017) or human intelligence sources (Russano, Narchet, Kleinman, & Meissner, 2014).

Live Field Studies of Simulated Police Interviews with Interpreting Practitioners

Research using realistic simulated interpreted police interviews has been conducted in the field with samples of legal interpreters as participants (Böser, 2013; Braun, 2017). Most typically, these studies have been led by interpreting researchers and have applied qualitative methods of analysis, such as discourse analysis. For example from 2008 to 2016, researchers in the UK and Europe conducted a series of programmatic comparative qualitative studies comprising three collaborative projects entitled Assessment of Videoconference Interpreting in the Criminal Justice Service (AVIDICUS 1-3; http://www.videoconference-interpreting.net/). Discourse analysis was the primary method used to assess remote interpreting by real interpreters in live staged simulated interviews, high in ecological validity (Braun, 2017). Several studies by this research group combined discourse analysis and descriptive quantitative methods (Braun, 2013, 2014). Despite a high degree of realism the small samples of participating interpreters in these field studies prevented random assignment to experimental conditions and the use of quantitative, inferential statistics.

Field Experiments with Interpreting Practitioners

While qualitative field research is valuable in learning about aspects of interpreting practices of concern in the field, in general, those studies are unsuited to testing psychological theories and causal relationships between variations in practices and their effects on interpreting outcomes. Thus, it is useful to complement qualitative field studies by undertaking controlled experimental studies in field settings with interpreting practitioners. For instance, to investigate optimal work conditions for simultaneous interpreters in the European Parliament, an interdisciplinary research team was assembled. Quantitative comparisons were made of in-person versus remote simultaneous interpreting by collecting and coding work samples from interpreters in the field (Roziner & Shlesinger, 2010). This method allows inferences about cause-and-effect relationships between interpreting practices and performance outcomes. In Australia, interdisciplinary teams conducted field experiments with interpreting practitioners in simulated investigative interviews to explore several topics related to interpreter performance, such as the impact of variations in interpreter training, placement and mode of interpreting on rapport between the interviewer and suspect, management of the interaction, and interpreting accuracy (Doherty, Martschuk, Goodman-Delahunty, & Hale, 2020; Hale et al., 2018, Hale, Goodman-Delahunty, & Martschuk, 2020a, 2020b).

Laboratory Experiments with Simulated Interpreted Interviews

Complementary to field studies, experimental laboratory studies using simulated interviews or interview tasks are best suited to test theories by examining cause-effect relationships between interpreting practices and outcomes. Next, we describe the types of laboratory studies of interpreted police interviews that have been conducted.

Most controlled experimental studies of interpreted police interviews, using quantitative methods, have been conducted by a legal psychology research team in the UK (Ewens, Vrij, Leal, et al., 2016a, 2016b; Ewens, Vrij, Mann, & Leal, 2016; Ewens et al., 2017; Vrij et al., 2017; Vrij, Leal, Fisher, et al., 2018; Vrij, Leal, Mann, et al., 2018; Vrij & Leal, 2020). Many of these studies assessed the number of details reported by interviewees in monolingual versus interpreted interviews, that is, the focus was on verbal cues to detect deception. Other experimental simulations conducted in the USA (e.g., Houston et al., 2017; Leins, Zimmerman, & Polander, 2017) explored topics related to interpreter performance, such as variations in the placement of the interpreter, and the influence of the interpreter on rapport between interview participants.

Strengths and Weakness of Quantitative Studies of Interpreted Police Interviews

A strength of both field experiments and laboratory experiments is the random assignment of participants to controlled conditions which permits causal inferences to be drawn about the effects of variations in witness directions, interviewing strategies, or interpreting tasks. However, the generalizability of the research findings can be limited by factors that diminish the internal, external, and ecological validity of the studies. Some illustrative examples are provided next.

Internal Validity in Research on Interpreted Police Interviews

Internal validity refers to the extent to which effects detected in a study were caused by an independent variable in the study, rather than by biasing effects of unmeasured variables. Factors that might limit the internal validity of the interpreting research include aspects of the (a) research design; (b) dependent measures of interpreting performance; and (c) dependent measures of interpreter-mediated interviewer–interviewee rapport.

Research Design Features

Research designs applied in experimental studies of interpreted police interviews are often creative, innovative, and intricate, especially when procedures are added to vary the ground truth of interviewee statements—that is, half of the participants are instructed to lie about what they observed in a videotape while the other half accurately describe what they observed. The use of a monolingual interview to establish a baseline for comparisons of interpreted interviews is a particular strength of many laboratory experiments (e.g., Ewens, Vrij, Leal, et al., 2016a, 2016b; Ewens, Vrij, Mann, & Leal, 2016; Ewens et al., 2017; Houston et al., 2017; Vrij et al., 2017; Vrij, Leal, Fisher, et al., 2018; Vrij, Leal, Mann, et al., 2018).

Some tensions exist between design features that strengthen the internal validity of interpreting assessments and those that strengthen the internal validity of deception detection measures. In this section, we discuss three internal validity concerns arising in some research designs that (a) generalize across dissimilar data from unique interviewee–interpreter pairs; (b) ignore potential order effects or practice effects on interpreters who repeat the same task within experimental groups; and (c) ignore the nonindependence of interviews interpreted by the same interpreter.

Generalizations Across Dissimilar Interpreter–Interviewee Pairs

In certain studies of interpreted interviews, the unit of interpreted language that is analyzed is unique for each source or participant. For instance, in some studies of verbal cues to deception, every participant interviewee relates a unique self-generated story or account which is interpreted by a small number of interpreters (e.g., Ewens, Vrij, Leal, et al., 2016a; Vrij & Leal, 2020). Thus, every interviewee–interpreter pair is unique, as is the case in qualitative case studies or discourse analyses of police interview transcripts. No assessment is made within each pair to assess the extent to which individual interpreted accounts are accurate, and the extent to which different interpreters might interpret each particular narrative similarly or differently is unknown. The sole measure of interpreting performance is a dependent measure applied by aggregating across dissimilar pairs within an experimental group or condition, that is, monolingual versus interpreted interviews. These design features are valuable in testing theories about verbal deception, but the procedures are less informative about interpreting. For instance, the researchers in one study dismissed a 15% decrement in the average number of details reported in interpreted versus monolingual interviews as minor and “expected” (Ewens, Vrij, Leal, et al., 2016a). Arguably, since the coding was at a relatively loose level of the verbal interpreted information, the basis to categorize this degree of omission in the proportion of reported details as either trivial or expected is questionable.

Interpreting Order and Practice Effects

In some studies, the same interpreters are used repeatedly in multiple interviews. For example, in a study on interpreted reverse chronological accounts, a Cognitive Interview strategy, two interpreters did all the interpreting for 20 interviewees who spoke the same native language (one of three). All interviewees described events observed in the same video (Ewens, Vrij, Leal, et al., 2016b). This design feature was useful in ensuring that all interpreters performed a comparable task. However, exposure to each successive video description by multiple interviewees afforded the interpreters increasing familiarity with its contents. When an interpreter performs the same experimental task with the same content multiple times in succession within the course of a single study, one might expect their performance to be affected by unmeasured aspects associated with task repetition and familiarity—that is, “order” or “practice” effects that might distort results obtained in that experiment. For instance, an interpreter more familiar with the videotape events might have filled in details that were implied but not specified by an interviewee. The researchers reported no steps taken to control for the order effects. Further, the same language groups and interpreters were used in multiple experiments in which their task was invariant—unidirectional interpreting of interviewee accounts of the same 6.6-min video in the same language pairs.

This internal validity threat was acknowledged by researchers in another interpreting laboratory experiment (Houston et al., 2017). Efforts to mitigate these effects included advising interpreters that all interviewees had watched different videos when in fact, they had not. The extent to which interpreters became aware of this ruse is unknown.

Nonindependence of Interpreted Data

In laboratory experiments on interpreted police interviews in which the same interpreters are used repeatedly across multiple interviews, statistical procedures should be applied to take into account the nonindependence of the data obtained from each of the interpreters, such as multilevel modelling statistical techniques. However, threats posed to the internal validity of the results by the nonindependence of the interpreted data were overlooked in several studies (Ewens, Vrij, Leal, et al., 2016a, 2016b, Ewens, Vrij, Mann, & Leal, 2016, Ewens et al., 2017; Houston et al., 2017; Vrij et al., 2017, Vrij, Leal, Fisher, et al., 2018, Vrij, Leal, Mann, et al., 2018). For instance, in Houston et al. (2017), 12 lay interpreters repeated the same task with multiple different interviewees (n = 125), while three professional interpreters were each assigned to substantially more interviews. In these studies, no intra-class correlations were provided describing the consistency or conformity of measures such as the number of reported details or rapport, by multiple interviewees in the same interpreter groups. Intra-class correlations were reported by Vrij and Leal (2020) in only one study in which three interpreters each interpreted approximately 100 interview responses. However, no controls addressed other confounded design features (language, language pairs, and interpreter skill); rather, all interviews in each of three target languages were conducted by a single interpreter despite acknowledged inequivalence in the bilingual competence of the three interpreters, and the same data were the basis of multiple different studies by the research team.

Dependent Measures of Interpreting Performance in Police Interviews

Although researchers have observed that differences in measures of interpreting accuracy and effectiveness are likely to lead to differences in research outcomes (Braun, 2013), the extent of methodological differences applied to assess interpreting quality and effectiveness in police interviews has not been examined.

Consideration of dependent measures is important because they establish the validity and reliability of the research outcomes. Validity and reliability are comprised of four related but separate components. The first of these, construct validity, is the extent to which the dependent measure captures variability in what it purports to measure, that is, interpreting performance in police interviews. The second, content validity, is the extent to which the dependent measure is representative of interpreting in police interviews. The third, criterion validity, is the extent to which a dependent measure correlates with performance measures of interpreted police interviews; and the fourth, face validity, is the extent to which the content of the dependent measure appears suitable to achieve its aims. Next, we discuss factors affecting the internal validity of dependent measures of (a) interpreting performance; (b) interpreter-mediated interviewer–interviewee rapport; and (c) deception in interpreted interviews.

Dependent Measures of Interpreting Performance in Police Interviews

Few researchers have considered using measures of interpreting proficiency applied by professional interpreting accreditation/certification or training institutions. These methods are helpful because they take into account the multimodal and complex nature of the interpreting task (i.e., verbal, paraverbal, and nonverbal communication) and are devised to distinguish between good and bad attributes of interpreting performance.

For example, findings of no differences in the interpreting of ad hoc bilinguals versus experienced (but not accredited) interpreters (Ewens, Vrij, Leal, et al., 2016a), or no differences when interpreters were placed adjacent to an interviewer versus behind the interviewee (Houston et al., 2017), might be attributable to loose definitions of interpreting fidelity (low construct validity) that include only partial replication of the propositional content in terms of the overall number of details mentioned (low content and criterion validity), and summaries of the content are considered adequate (low face validity). Such approximations would not be considered valid and reliable measures of interpreting performance accuracy by professional interpreters, accreditation/certification bodies, or interpreting schools.

A standard set of marking criteria used in oral interpreting examinations to assess interpreting students in Australia emphasizes the positive features of interpreter performance by applying acompetency-based approach and uses a discourse pragmatic framework, taking into account the content and style of the utterances and their effect on listeners (Hale, 2010). There are seven criteria, presented and weighted in order of their importance, with detailed descriptors. Dependent on the importance of an interpreting performance criterion, different weights are applied, and they sum to 100% in total (see Table 1). This measure was used in a series of controlled field experiments (Hale et al., 2018; Hale, Goodman-Delahunty, & Martschuk, 2020a, 2020b) to examine the impact of a range of variables on interpreting performance, for example, interpreter training and education, interpreter practical experience, mode of interpreting, remote versus in-person interpreting, rapport maintenance, and interview duration.

Table 1 Elements of assessment of interpreting performance

An alternative approach is a point-deduction or error analysissystem reliant on a mix of inductive and deductive processes. Each interpreted statement is compared with the English language source and with the target language source to assess accuracy (positive features) and the nature of errors (negative features) on the six key elements (content, style, legal discourse and terminology, management and interaction, interpreting protocols, and paralinguistic rapport markers; Hale et al., 2018). For example, in the UK AVIDICUS Projects, errors in the following four interpreting features were coded by two trained researchers (Braun & Taylor, 2012c) to allow quantitative analyses of interpreter performance: (a) semantic or content-related categories (omissions, unnecessary additions, inaccuracies, and coherence problems); (b) linguistic categories (lexical/terminological problems, idiomaticity, grammar, style/register, coherence, language mixing); (c) paralinguistic categories (articulation problems, hesitations, word-level repetition, false starts, and self-repairs); and (d) interaction-related categories (turn-taking problems, especially overlapping speech). Many of these features are the same as those displayed in Table 1.

In qualitative discourse analyses, often conducted on written transcripts, comparisons of the source and the interpreted communication are laid bare (for examples, see Hale, Goodman-Delahunty, & Martschuk, 2020a; Hale, Martschuk, Goodman-Delahunty, Taibi, & Han, 2020). A series of transcription conventions is used to specify features of the source utterance and the interpreter’s rendition, such as symbols designating rising and falling intonation, the duration of pauses, and syllables spoken softly or loudly (e.g., Böser, 2013). Transparency about the classification and annotation of utterances, and what comprises an error of omission, an error of commission, a turn-taking coordination error, a cultural misinterpretation, etc., promotes more consensus. Nonetheless, even among linguistic and interpreting scholars who take propositional and pragmatic features of communication in police interviews into account (Berk-Seligson, 2009; Hale, 2010; Lai & Mulayim, 2014; Nakane, 2014), methods to assess interpreting proficiency are not standardized.

Dependent Measures of Interpreter-Mediated Rapport in Police Interviews

Inter-personal rapport is a complex construct central to investigative interviews, thus it can be difficult to operationalize. Measures of rapport applied in interpreted police interviews have varied in terms of their rigor, objectivity, and validity. In some studies, researchers have applied retrospective, post-interview measures of perceived rapport, rather than assessments of dynamic changes in rapport throughout the interview. For instance, in a study by Ewens, Vrij, Leal, et al. (2016a), ratings of perceived interviewer–interviewee rapport were provided by role-playing interviewees following a simulated monolingual or interpreted interview consisting of five scripted questions. These global subjective retrospective impressions were not based on any definition of common understanding of rapport, nor were they compared with interviewer ratings of rapport in the same interviews, nor with any objective assessments of rapport. Similarly, in a subsequent study, non-native English-speaking participants who were interviewed via an interpreter were later asked if this had been a positive experience (Ewens et al., 2017). These measures are weak, as they are low in construct, content, criterion, and face validity.

By comparison, in other studies, composite objective measures of rapport were applied: multiple components of verbal, paraverbal, and nonverbal rapport features were distinguished and rated separately. The replication of verbal and paraverbal rapport markers was coded by professional interpreters using criteria two, three, and four in Table 1, from videotaped interviews lasting approximately 30 min and from transcriptions of the interviews (Goodman-Delahunty, Hale, Martschuk, & Dhami, 2020; Hale et al., 2018). Interpreter maintenance of nonverbal rapport features was assessed concurrently at regular intervals throughout live interpreted interviews by an observer present in the interview room (posing as a second member of the police interview team). In other words, the accuracy of nonverbal facets of interpreting was objectively assessed by coding the extent to which interpreters replicated paralinguistic behaviors of both speakers in terms of pitch, tone, facial expressions, and gestures (Goodman-Delahunty et al., 2020).

Dependent Measures of Deception Detection in Interpreted Police Interviews

Many laboratory experiments have examined cues to deception in interpreted interviews. For example, much research has focused on verbal cues as indicators of veracity (Nahari et al., 2019), such as the quantity and quality of reported details. Other verbal features, such as repetitions, and paraverbal features such as pitch and hesitations, are also important cues to deception (DePaulo et al., 2003; Sporer & Schwandt, 2006), and thus important for interpreters to replicate in terms of content validity. Yet few studies have explored nonverbal cues to deception such as response latency (van der Zee, Poppe, Taylor, & Anderson, 2019). Paraverbal and nonverbal measures cannot be assessed from interpreted transcripts or from written English translations of interpreted interviews. The latter form of data is a constrained measure of oral verbal interpreted responses, but is the form relied upon by many deception researchers to assess interpreted versus monolingual interview responses.

When experimental researchers employ the same verbal dependent measure, such as counts of the number of unique reported details, differences arise due to coding practices applied in one research laboratory versus another, as there is no agreed unitary set of criteria and coding rules determining how verbal details should be counted. Differences that could lead to contrary results in monolingual studies might be magnified when coding interpreted police interviews. Some coding rules established by deception researchers in monolingual interviews to ignore repetitions and paraverbal cues (e.g., Ewens, Vrij, Leal, et al., 2016a) contradict the rules that professional interpreters are trained to observe. These contradictions underscore the fact that dependent measures developed to test specific verbal deception theories are not valid to assess interpreting accuracy. For example, this type of unimodal, unidirectional coding represents less than 10% of the criteria for interpreting performance displayed in Table 1. Research outcomes based on these verbal deception measures should not be conflated with or compared with outcomes of multimodal coding of interpreting proficiency. Further, researchers applying constrained verbal coding schemes of this type should avoid generalizing study outcomes to interpreting performance, as the measures lack construct, content, criterion, and/or face validity to assess interpreting performance.

External Validity in Research on Interpreted Police Interviews

External validity refers to the extent to which the findings of a research study generalize to real-world interpreted interviews. In other words, will the research outcomes be replicated in actual interpreted police interviews conducted by investigative interviewing practitioners with non-English-speaking suspects and witnesses?

The external validity of research on interpreted police interviews can be curtailed by certain research procedures and sampling biases. These features also inhibit comparisons of research outcomes across studies. Sampling biases can be associated with relevant characteristics of interpreters that influence their interpreting performance, such as past interpreting experience, training in interpreting, language proficiency in the paired languages, cultural competence, knowledge of the interview subject matter, specialized legal or other terminology, memory abilities (e.g., working memory), and note-taking skills for consecutive interpreting (Chen, 2017). In this section, we discuss (a) samples of ad hoc bilinguals, (b) samples of interpreters, and (c) interpreter sample size.

First, we discuss limitations associated with some research samples of untrained ad hoc bilinguals. In some experimental studies, although the competence and qualifications of the interpreters exceeded that typically obtained in police interpreting, the interpreters lacked professional experience in legal settings (Böser, 2013). In other studies, convenience samples of ad hoc bilingual individuals or students were used as mock interpreters in simulated police interviews. Evans et al. (2020) cautioned that reliance on lay interpreters such as ad hoc bilinguals and undergraduate students, rather than professional interpreters, might limit the research outcomes and their generalizability. Results of formal empirical comparisons of interpreting in realistic simulated investigative interviews by ad hoc bilinguals versus trained, accredited interpreters underscore this point (Hale et al., 2018).

Second, external validity can relate to samples of practicing interpreters used in field and laboratory experiments. In a European study, professional interpreters each provided multiple real-world systematic samples from their daily work practice (Roziner & Shlesinger, 2010). In the UK (Braun, 2014; Braun & Taylor, 2012c), English–French professional legal interpreters with a minimum of five years’ experience in police services participated in the simulated interviews. In Australia, Hale et al. (Hale et al., 2018, Hale, 2019, Hale, Goodman-Delahunty, & Martschuk, 2020a) recruited samples of professional, accredited, and primarily trained practicing interpreters from the NAATI and AUSIT directories for participation in live, simulated, field experiments. However, practicing interpreters used in laboratory studies were not necessarily trained, accredited, or certified for legal interpreting work, and many lacked professional practical experience. For instance, of 12 interpreters in a study by Ewens, Vrij, Leal, et al. (2016a)), 5 (41%) had no practical interpreting experience. Lay interpreters used in police interviews in South Korea are not professionals (Lee, 2017). Accordingly, findings derived from Korean interpreter samples, Russian interpreter samples (specified as “fluent in English”), and English–Spanish interpreter samples (characterized as “bilingual”) in studies by Ewens, Vrij, Leal, et al., 2016a, 2016b, Ewens, Vrij, Mann, & Leal, 2016, Ewens et al., 2017, Vrij et al., 2017, Vrij, Leal, Fisher, et al., 2018, Vrij, Leal, Mann, et al., 2018, and Vrij and Leal (2020) might need scrutiny.

Finally, external validity can relate to interpreter sample size. In many studies, the number of participant interpreters was very small: a total of three (one per target language) in Vrij and Leal (2020) for over 300 interviewees; six in Böser (Böser, 2013) and Ewens et al. (2017) (two per target language); 11 in Lai and Mulayim et al. (2014); 15 in Braun and Taylor (2012c); and 20 each in Gile (2001) and Howes (2018), respectively. Because these interpreter samples were small and purposive, rather than random or representative, the study findings might have limited generalizability in terms of interpreting performance. By comparison, in field experiments, larger samples of practicing interpreters were recruited: 570 systematic work samples from 36 interpreters (Roziner & Shlesinger, 2010); and randomized assignment of 46 interpreters (Hale et al., 2018); and 103 interpreters (Hale, Goodman-Delahunty, & Martschuk, 2020b) to interviews lasting approximately 30 mins.

Ecological Validity in Research on Interpreted Police Interviews

As just discussed, external validity is the extent to which the findings of a research study generalize to real-life interpreted police interviews. Ecological validity depends on the extent to which the features of simulated investigative interviews match those of real interviews. Thus, ecological validity has implications for external validity. For instance, generalizability beyond the context of one particular experiment might be limited by the brevity of the interpreted interaction and by reliance on undergraduate students or actors to role-play as interpreters, interviewers, and/or interviewees. One prominent factor that distinguishes prior studies of interpreted police interviews is the nature and scope of the interpreting task used to assess interpreting performance.

Examples of features of the interpreting task include the language pair, interpreting directions, the mode of interpreting, features of the speech, features of the speakers, expected response, task duration, preparation, task criticality, and task novelty (Chen, 2017). The extent to which interpreter participation and research tasks replicate or are representative of the experiences of professional interpreters who work in real police interviews has varied substantially.

Some interpreting tasks in past studies (e.g., Houston et al., 2017) violate core principles in interpreting codes of ethics, while others are strong in ecological validity, but are very truncated. Many researchers (Ewens, Vrij, Leal, et al., 2016a, 2016b, Ewens, Vrij, Mann, & Leal, 2016, Ewens et al., 2017; Houston et al., 2017; Vrij et al., 2017, Vrij, Leal, Fisher, et al., 2018, Vrij, Leal, Mann, et al., 2018) have simply presumed that police interviews must be conducted in the consecutive interpreting mode, but implemented the long, consecutive monologic mode (Vrij & Leal, 2020), which is atypical in police interviews when open-ended questions are asked. Experimental interpreting research has been conducted on spontaneous natural language generated in artificial, contrived interviews, and on realistic scripted enacted interviews. The extent to which the interpreting task is bidirectional has varied.

In this section, we discuss four aspects of task representativeness, namely (a) interpreter roles; (b) interpreting task duration; (c) spontaneous and scripted speech samples; and (d) unidirectional interpreting. These features of the interpreting task can diminish the ecological validity of the research and hence the generalizability of the findings to real-world interpreted police interviews.

Interpreter Roles in Police Interviews

Some differences in the selection of experimental variables are attributable to disciplinary and jurisdictional differences. For example, interpreting practitioners and scholars are unlikely to support research that requires interpreters to violate the principle of neutrality in their professional code of ethics (Mulayim & Lai, 2017). By comparison, experimental psychologists have pursued lines of research requiring role-playing interpreters to compromise their professional neutrality by engaging in rapport-building with an interviewee, to participate in interview questioning as members of the police interview team, or to be seated next to a police interviewer and opposite the interviewee, visibly aligned and affiliated with the police interviewer (e.g., Houston et al., 2017). At times, to test causal relationships, or to implement a particular control group in an experimental laboratory study, departures from standard interpreting practices can be instrumental, even though they are not recommended as a best practice and are unlikely to be implemented in real practice.

Interpreting Task Duration

One limitation of some studies is the truncated nature of the target task—that is, the speech sample is too brief to represent what transpires in the course of a police interview. Gile (2001), for example, compared interpreting modes of a speech unit that was a total of 280 words in length, lasting 100 s in the simulated international conference condition. Other than the brief duration, the task was realistic and demonstrated the nature of errors more likely to arise in simultaneous versus consecutive interpreting modes. In another study of simultaneous interpreting with brief units of analysis (180 s), a representative sample was obtained by including up to 20 work samples from each interpreter across five workdays, two from morning sessions, and two from afternoon sessions (Roziner & Shlesinger, 2010).

Some guidance on how long an interpreting task should be to test interpreter performance comes from Braun (2014), who reported that paralinguistic problems increase after approximately 15–20 min of interpreting. Hence, a longer interpreting task is necessary if the research aims to assess factors such as interpreter fatigue. Interpreted interviews conducted in the long consecutive mode in laboratory experiments were comparatively brief, lasting an average of 16 min (e.g., Ewens et al., 2017). In live, simulated field studies, the interactive interpreting tasks lasted 25–30 min (Braun & Taylor, 2012c; Hale et al., 2018, Hale, Goodman-Delahunty, & Martschuk, 2020a, 2020b), or up to 45 min (Böser, 2013), and included all phases of a police interview.

Spontaneous and Scripted Speech Samples

A strength of some laboratory experiments is using research procedures to generate samples of spontaneous natural language from interviewees in simulated interviews. For instance, the interpreted interview in a study by Ewens et al. (2016a) took the form of an interview of a job candidate who responded to five open-ended questions. However, a police interview might be perceived as more adversarial or formal than a job interview and might induce participants to modify their communications in comparison to their behaviors in a job interview. Results of analogue interviews in a different social context might not generalize to real-world police interviews.

The field research conducted by Böser (2013) in Scotland, by Braun and colleagues in the UK and in Europe for the AVIDICUS Projects (Napier, Skinner, & Braun, 2018), and by Hale and colleagues in Australia, used realistic simulated suspect interviews based on real cases, approved as such by police interviewing practitioners. In some field studies, spontaneous language samples were generated. For instance, six “eyewitnesses” whose native language was French or German watched CCTV footage of a real-life car theft. Via an interpreter, these individuals were questioned about the crime by an English-speaking investigating police officer who conducted a complete standard Scottish information gathering interview (Böser, 2013).

Field experiments by Hale and colleagues were conducted in realistic real-world settings such as secure interview facilities used by counter-terrorism police to interview high value detainees. When debriefed, some interpreters disclosed that they were unaware that the interview was simulated (Goodman-Delahunty, Hale, Martschuk, & Dhami, 2015). However, the interviewer and interviewee were professional actors working from a script. The performance of the same task by every participant interpreter strengthened the internal validity of these studies; the fact that the interpreted questions and responses were not spontaneously generated reduced the ecological validity of the interpersonal dynamics between interviewer and interviewee.

Unidirectional Interpreting Tasks

In most laboratory experiments to date, the interpreting tasks entailed few or restricted interactions between interviewer and interviewee, as the research aim was to elicit lengthy, monologic, narrative responses (e.g., Vrij & Leal, 2020). For instance, in some studies by Ewens, Vrij, Leal, et al. (2016b, Ewens et al. 2016), the interviewer asked two scripted questions, and in Ewens et al. (Ewens, Vrij, Leal, et al., 2016a; Ewens et al., 2017) five scripted questions, irrespective of what the interviewee said in reply. Interpreters were instructed not to interrupt interviewees, and interpreting analysis was unidirectional such that only interviewee responses interpreted into English were assessed, as the focus was the number of unique details reported by interviewees (e.g., Vrij & Leal, 2020). By comparison, field studies (Böser, 2013; Braun & Taylor, 2012c), and controlled field experiments (Hale et al., 2018, Hale, Goodman-Delahunty, & Martschuk, 2020a, 2020b), included extensive interactional interviewer–interviewee question–response exchanges (e.g., 60 and 42 exchanges). These studies with extensive interactive speech samples provided a more thorough test of an interpreter’s proficiency, tested the interpreters’ skills bidirectionally, both from and into English, and tested their interaction managementskills (e.g., Licoppe et al., 2018).

Synopsis on Research Approaches to Interpreted Police Interviews

To understand the impact of an interpreter in a police interview, a wide range of research approaches has been applied. Traditional empirical methods favored by linguistic and interpreting scholars are field studies applying micro-level discourse analysis to professionally interpreted units of oral communication. Field experiments and laboratory experiments using quantitative methods to test cause–effect relationships are recent innovations. Strengths of internal, external, and ecological validity vary between studies. Reliable quantitative methods to assess the performance of interpreters in police interviews are still being developed. Examples show these are broader and more complex than unitary and unidirectional measures for specific purposes, such as counting details in interpreted verbal reports. To advance the field, greater consensus is needed among researchers about quantitative dependent measures to assess the performance of interpreters in police interviews.

Contemporary Research on Interpreted Police Interviews

In this section, we present research findings on six topics that are pivotal in interpreted police interviews. The first four topics center on fundamental aspects of the interpreting process, namely (a) the impact of the interpreting mode in police interviews; (b) the interpreter’s role; (c) interpreting accuracy and performance; and (d) interpreted interviews via videolink and telephone. Next, we review findings on two topics driven by contemporary police interviewing practices that have a direct bearing on the effectiveness of interpreted police interviews, namely (e) the priority of investigative interviewing strategies; and (f) the impact of interpreting on witness credibility.

The Impact of Interpreting Modes in Police Interviews

The interpreting mode best suited to police interviews, and under what circumstances, remains open to empirical assessment. To date, researchers have examined the influence of the interpreting mode in police interviews on (a) interpreting performance and (b) interpreter fatigue. These findings are presented in turn.

The Influence of Interpreting Mode on Interpreting Performance in Police Interviews

A common assumption by practicing interpreters and some researchers (Evans et al., 2020) is that interpreting performance is better and that cognitive load or task demands are lower in the consecutive than the simultaneous interpreting mode. This view might not be supported by empirical findings. In some prior studies, mode of interpreting (consecutive vs. simultaneous) and presence of the interpreter were confounded in comparing the accuracy of the two modes. For example, Hornberger et al. (1996) tested face-to-face interpreting in the consecutive mode and compared this with simultaneous mode interpreting from a remote location. Thus, results showing that fewer additions were inserted by interpreters in the remote location might be attributable to the mode or might be attributable to the remote location of the interpreters. Without a fully crossed experimental design, the precise cause of these observed outcomes cannot be discerned.

To date, most comparisons of interpretingaccuracy according to mode have been conducted in relation to court interpreting. For example, one study of four English–Spanish interpreted US court proceedings compared consecutive and simultaneous modes and revealed that interpreters had difficulty achieving accuracy of the degree of coercion in leading questions in both modes, but were more than twice as accurate in the consecutive than the simultaneous mode (70.6% vs. 33%) (Berk-Seligson, 1999). However, opposite results emerged in nonlegal settings. For example, a panel of experts rated the performance of ten professional conference interpreters who interpreted a speech in both modes as significantly more accurate in the simultaneous mode (Gile, 2010). The interpretation arising from the simultaneous mode more closely approximated the original speaker’s style, a crucial element in legal interpreting. Similarly, a panel of experts who compared the accuracy of consecutive versus simultaneous interpreting in a medical setting found the simultaneous mode achieved better results (Gany et al., 2007). In that study, the training and competence of the interpreters were matched, whereas the court comparisons did not control this source of variation.

The Influence of Interpreting Mode on Interpreter Fatigue in Police Interviews

The simultaneous and the consecutive interpreting modes are both demanding to interpreters, albeit in different ways. One might expect fatigue to develop more rapidly for consecutive than simultaneous interpreting, as the former mode relies more extensively on aural than visual information (e.g., Klinger, Tversky, & Hanrahan, 2011 compared the cognitive load of visual vs. aural tasks). Conversely, the high demands on attention and working memory of the simultaneous interpreting mode are viewed by some as more taxing (Köpke & Nespoulous, 2006). A study of the impact of fatigue on conference interpreting performance in the simultaneous mode showed that accuracy declined markedly after 60 min (Moser-Mercer, Kunzli, & Korac, 1998). For this reason, the standard practice in international settings is for simultaneous interpreters to work in pairs and to alternate every 30 min. When the consecutive interpreting mode is used in a police interview, the interview duration is typically doubled, increasing the risk of interpreter fatigue. Yet court interpreters in domestic settings, who typically work alone in the consecutive mode, have breaks approximately every 90 min (JCCD, 2017; Roberts-Smith, 2009). No analogous protocols have been established in police interviews despite the fact that four separate Articles (5, 9, 11, and 29) in the Universal Declaration of Human Rights (United Nations, 1948) address suspect interviews, including their duration and the number of times that a detainee is interviewed. Estimates by Australian police interviewers of the duration of their monolingual interviews showed that most lasted approximately 60–75 min, out of a recommended maximum of 4 h (Sivasubramaniam, Goodman-Delahunty, Fraser, & Martin, 2014). The effect on accuracy in lengthy police investigative interview sessions has not been thoroughly researched. Further investigation of the interrelationship between interpreting modes and the duration of interpreting was recommended (Seeber, 2011).

Research applying cognitive load theory to interpreting is fairly new and is of interest because it can indicate the task difficulty of interpreting (Chen, 2017). This psychological theory predicts that the difficulty of performing a task is associated with the volume and inherent difficulty of information to be extracted from a source, and the way information is presented. Cognitive load has been defined as “the amount of capacity the performance of a cognitive task occupies in an inherently capacity-limited system” (Seeber, 2013, p. 19).

Intrinsic and extraneous cognitive load are distinguished. For instance, a high intrinsic cognitive load is predicted when the duration of the interpreted interview is protracted, the language is highly technical, the interviewer’s questioning strategies are intricate or complex, and emotional expressivity is heightened (e.g., it includes expressions of profanity). Further, the intrinsic cognitive load is increased when the interpreter’s attention must be allocated between multiple task features, such as management of the interaction between speakers as well as the information they convey, or taking notes while listening to the speakers. Extraneous cognitive load is generated by presenting information in a format or manner that includes unnecessary information that unduly burdens the learner.

In interpreting research, the cognitive load comprises two main aspects: (a) task and environmental characteristics which determine the amount of mental work to be done in a specific task under certain circumstances; and (b) interpreter characteristics (Chen, 2017). The mode of interpreting (simultaneous or consecutive) is a task characteristic. While the mental effort required to perform interpreting tasks, and especially simultaneous interpreting tasks, has attracted considerable research interest, consensus on how to measure cognitive load has not been achieved (Seeber, 2013). Recently, pupillometry was acknowledged as an effective indirect index of effort in cognitive control tasks (van der Wel & van Steenbergen, 2018). In other words, pupil dilation is useful not only to assess task difficulty but also the cognitive effort exerted.

Using an experimentally controlled mixed research design, Doherty et al. (2020) measured the cognitive load on qualified interpreters during a simulated police interview using pupillometry and blink rates by locating the interpreters remotely (via audio- or videolink) from the interview room. The interview was between an English-speaking interviewer and an Arabic-, Mandarin-, or Spanish-speaking suspect. Interpreters were recruited from the local pool of professional interpreters to interpret in both the simultaneous and the consecutive interpreting modes (order was counterbalanced). Analyses revealed a greater cognitive load in the consecutive than the simultaneous interpreting mode. This was reflected in significantly less gaze time at the interviewer and the suspect in the consecutive mode due to off-screen note taking followed by an observable pattern of disrupted visual attention before reorientation. Moreover, longer gaze time and a lower cognitive load were significantly associated with increased interpreting accuracy. Results showed that the interpreters performed significantly better in the simultaneous than the consecutive interpreting mode. Higher rates of interpreting accuracy were reflected in multiple convergent measures: interpreting style, maintenance of verbal rapport markers, and interactional management. Further investigation of the impact of interpreting mode on accuracy in police interviews is recommended, using a variety of different research designs.

The Role of an Interpreter in Police Interviews

The way witnesses or suspects perceive interpreters’ social identities and alliance can influence their comfort and willingness to respond frankly to the interview questions (Smith-Khan, 2017). However, much research on the interpreter’s role has examined interpreter rather than suspect or witness perceptions. An exception is the interactional sociolinguistic discourse analytical approach used to examine custodial interrogations of Hispanic suspects, showing that their Miranda rights were jeopardized by police officers who were assigned the role of interpreter (Berk-Seligson, 2009).

One in-depth qualitative study conducted in the UK explored interpreters’ perceptions of their role and of their own performance in an interpreting task in legal settings when working in-person or via videolink. Results demonstrated that interpreters working by videolink shifted perceptions of their role depending on the legal context (i.e., prison versus court; Devaux, 2017). A dynamic model that integrates self-presentation, participant alignment, and interactional management in legal settings in person versus via videolink (Llewellyn-Jones & Lee, 2014) was applied to responses from a sample of 18 certified interpreting professionals (Devaux, 2018). Among the findings associated with consecutive videolink interpreting compared to consecutive in-person interpreting were shifts in interpreter alignment depending on the location and configuration of the respective parties, reductions in perceived interpreter-speaker rapport, and increased rationalizations by interpreters about ethical issues due to limitations experienced in managing the interaction (Devaux, 2017, 2018). The absence of confirmation bias studies examining the influence of interpreters’ beliefs about the interviewee or the case was noted by Evans et al. (2020).

In Australia, where the interpreting profession is relatively well established, with university training, a national certification system, a national professional association, and an agreed code of ethics, professional interpreters agree on their role as independent, impartial, and accurate interpreters. A survey study of 340 participants confirmed that this role was well understood by trained interpreters, while untrained bilinguals did not understand the importance of impartiality (Goodman-Delahunty et al., 2015). For instance, the majority of interpreters rejected the idea of assisting police in questioning the suspect, while more than half of untrained bilinguals endorsed it. Approximately one in five untrained bilinguals believed the interpreter’s role included getting the witness to tell the truth, but fewer than one in 20 trained interpreters agreed. By contrast, trained interpreters were more likely than untrained bilinguals to agree that an interpreter should make appropriate cultural adaptations. In all, trained interpreters were more likely than ad hoc bilinguals to perceive their role as neutral, and that their duty was to interpret everything said. Further analyses demonstrated that interpreters’ understanding of their role was significantly associated with interpreting accuracy in a simulated police interview (Goodman-Delahunty et al., 2015).

The Influence of Placement of Interpreters on Their Role

Different viewpoints exist about where best to place the interpreter in a police interview. Police investigators might change the placement according to their interview goals. If the goal is to portray dominance and increase distress in the interviewee, interviewers in the USA suggest placing the interpreter behind the interviewee (US Department of the Army, 2006). Placement behind the interviewee was suggested as effective in minimizing private conversations between the interpreter and interviewee, which would occur only with untrained interpreters. A laboratory experiment in which the interpreter sat either beside the interviewer or behind the interviewee showed that the latter position resulted in more negative ratings of the interaction by interviewees (Houston et al., 2017).

Conversely, interviewers might place the interpreter between the interviewer and the suspect in order to facilitate the interpreter’s ability to accurately interpret rapport strategies (Goodman-Delahunty & Martschuk, 2016). In general, the preference of interpreters is the latter position, in which they are situated in an equidistant position between the interviewer and the interviewee, as this placement fosters their impartiality, facilitates management of turn taking between speakers, and provides a full view of both speakers for optimal access to nonverbal and paraverbal rapport cues (Goodman-Delahunty et al., 2020). Notably, in some laboratory studies purporting to test the triangular position, the interpreter was placed next to the interviewer and opposite the interviewee, which does not afford the interpreter a full view of both speakers (e.g., Ewens et al., 2017; Houston et al., 2017).

Measuring Interpreters’ Performance in Interpreted Police Interviews

Relevant characteristics that influence interpreting performance in police interviews are interpreters’ professional background, interpreting experience, and knowledge of the subject matter. Next, we elaborate on research findings on (a) the interpreting performance of bilinguals and trained interpreters in police interviews, and (b) the influence on interpreting performance of advance briefing on interview topics and vocabulary.

Interpreting Performance of Bilinguals and Trained Interpreters in Police Interviews

The training or experience of interpreters has rarely been taken into account in comparing the accuracy of the performance of interpreters in legal settings (Hale et al., 2018). Without these assessments, the generalizability of the findings to practicing interpreting professionals is placed in question. One key difference between lay interpreters and professional interpreters is that the former group is not bound by professional codes of ethics (Evans et al., 2020).

One controlled experimental study assessed the performance of trained interpreters versus untrained bilinguals. Trained interpreters performed significantly better than untrained interpreters on all elements of interpreting proficiency assessed (Hale et al., 2018). At the conclusion of every interpreted simulated police interview, assessments of the interpreters’ professional credibility were gathered from the actors role-playing the police interviewer and the suspect, who were blind to the status of the interpreters. Both actors rated the credibility of the trained interpreters as significantly greater than that of their bilingual counterparts, on all credibility factors of the Witness Credibility Scale (Brodsky, Griffin, & Cramer, 2010): trustworthiness, confidence, likeability, and knowledgeability.

Following their participation in an interpreting task in a live simulated experimental police interview, trained interpreters and untrained bilinguals completed a self-assessment questionnaire comprising 21 items to which participants indicated their agreement on a Likert-type rating scale. Factor analysis yielded two factors: Overall Competence and Language Reproduction (Martschuk et al., 2020). The findings indicated that untrained bilinguals tended to overestimate their Overall Competence, while self-perceptions of their Language Reproduction skills were more critical than those of the trained interpreters.

The Influence on Interpreting Performance of Advance Briefing on Interview Topics and Vocabulary

Many practicing interpreters hold the view that a lack of advance briefing and preparation can be detrimental to their ability to interpret accurately (Hale, 2013b; Hale & Napier, 2016; Russano, Narchet, Kleinman, & Meissner, 2014). Accordingly, some interviewing practitioners provide information about a case to an interpreter (Shaffer & Evans, 2018). However, legal practitioners typically oppose the provision of briefing materials on grounds that interpreter neutrality might be compromised by advance knowledge (Hale, 2013b). This does not preclude briefing on interviewing strategies.

Some research has shown that prior access to relevant documents improves text comprehension (written or oral) (McNamara & O’Reilly, 2009) and interpretingaccuracy in conference settings (Díaz Galaz, 2011; Gile, 2005; Pozo Triviño, Fernández Rodríguez, & Galanes Santos, 2012). To date, no experimental research has been published on the impact of advance briefing on the performance of legal interpreters in police interviews.

Remote Interpreting by Videolink and Telephone in Police Interviews

To overcome the problem of low availability of competent interpreters, some countries have considered the creation of national registers of a pool of specialist trained interpreters who can service all areas (Hale, 2011). Qualified interpreters can be flown into work in the required areas, as is common among conference interpreters who work for international organizations. Among the main advantages of remote interpreting (technologies are used to access an interpreter who is physically separated from the primary participants) in police interviews is prompt access to an interpreter without compromising security issues (Braun & Taylor, 2012a, 2013). However, the costs of this practice can be very high. To reduce the cost and to increase access to specialist interpreters, remote interpreting has become popular (ImPLI Project, 2012, p. 25), either via teleconference (Kelly, 2008; Wakefield, Kebbell, Moston, & Westera, 2014) or videoconference (the interviewer and interviewee are connected by technology, and the interpreter is co-located with one of these participants; Braun, 2014; Shaffer & Evans, 2018). A further configuration is a three-way connection in which all (i.e., the interviewer, interviewee, and the interpreter) are in different locations (Napier et al., 2018). Yet, interviews conducted with police and military interviewers in Australia and the Asia Pacific, who had extensive experience working with interpreters, disclosed their preference for face-to-face interpreting over remote interpreting (Goodman-Delahunty & Martschuk, 2016). Similarly, a survey of 166 legal interpreters working globally via videolink disclosed their preference for face-to-face interpreting (Braun & Taylor, 2012b).

Factors often cited in opposition to remote interpreting include the absence of visual cues, the poor quality of sound and visual reception, the lack of adequate protocols, the lack of preparation, and the lack of training for interpreters and users of their services (Rosenberg, 2007; Wadensjö, 1998; Wang, 2017; Xu, Hale, & Stern, 2020). Some concern was raised that remote bilingual communication is not as reliable as face-to-face bilingual communication (Braun & Taylor, 2012a, 2012b, 2012c; Goodman-Delahunty & Martschuk, 2016). However, much of the early research on remote interpreting used outmoded technology, was limited in scope, and did not examine legal interpreting in police interviews (Ko, 2006; Ozolins, 2011; Wadensjö, 1998).

Telephone Interpreting in Police Interviews

Research on telephone interpreting has shown a deterioration in the performance of interpreters without visual cues (Ozolins, 2011; Wadensjö, 1999; Wang, 2017). A recent observational study of 17 telephone interpreting interviews in NSW Legal Aid offices in Australia (Xu et al., 2020) confirmed some of the previous results, in particular the added difficulties caused by a lack of visual cues, not only for interpreters but also for the interviewers. It highlighted a noticeable loss of control by the interviewers, who were unable to see the interpreter on the other side of the line and who at times disappeared from the interaction for periods of time, sometimes due to connection issues and sometimes without any explanation. Although the technology has improved over the years, this study demonstrated that old-fashioned telephones are still being used by the interviewers, and that interpreters at times used mobile telephones with poor reception in unknown locations (Xu et al., 2020).

Videolink Interpreting in Police Interviews

Some support for the multimodal communication model emerged in a study of the performance of 15 French–English interpreters in criminal proceedings via videoconference and teleconference (Braun & Taylor, 2012c; Braun, 2017), using qualitative methods. They compared interpreter performance in a live, simulated police interview conducted either face-to-face or via remote interpreting. Three remote configurations were tested across 16 sessions and two types of criminal cases. As predicted by the multimodal communication model, interpreter accuracy suffered with video-mediated interpreting: A higher level of accuracy was consistently achieved in the face-to-face than the remote condition. Results provided some support for the hypothesis that greater access to visual cues contributed to increased interpreting accuracy on a range of measures.

Somewhat paradoxically, the interpreters were saying more (using more words) but were conveying less (fewer propositions) in the remote conditions. The problems that occurred were classified as linguistic, paralinguistic, and cultural, and were associated with the interpreters’ cognitive processing capacity. Notably, in this study, the sample size was small, and all interpreters first interpreted remotely and then face-to-face. Thus, order effects might have exaggerated the differences in findings.

More recently, a field experiment was conducted with 103 Arabic-, Mandarin-, and Spanish-speaking qualified interpreters who interpreted a 30-min live-simulated police interview face-to-face, via videolink, or via audiolink (as a between-participants variable) (Hale, Goodman-Delahunty, & Martschuk, 2020a). Interpreting performance was assessed using multidimensional features of the interpreting task, including propositional content, manner of delivery, legal terminology, interpreting protocol, and management. Analyses showed no differences in performance when interpreting in person and via videolink, while interpreting performance was significantly lower via telephone. This effect held across all three language pairs and was pronounced in the measures of interpreting style and management skills. These findings suggest that the absence of visual cues in telephone interpreting might have contributed to the decrement in interpreting performance.

Visual Display of Parties to a Remotely Interpreted Police Interview

Past psychological research in the USA established that when the suspect’s face is the focus of the camera display and the interrogator is not accorded equal space, observers tend to perceive the suspect’s statements to be more voluntary and less coerced; thus, perceptions of the suspect’s culpability can increase (Lassiter, 2010). However, when the suspect is a minority (e.g., Black or Chinese), and the interrogator is White, even when images of both the suspect and the interrogator are displayed equally, the minority suspect’s statements were perceived as more voluntary, and his guilt as more probable. This effect disappeared when both the interrogator and the suspect were minorities (Ratcliff et al., 2010).

To date, these effects have not been tested in the context of video-mediated remote interpreting, where various members of the interview group might appear remotely on a video screen, depending on the configurations of who is co-located and who is attending the proceedings from a remote location. At times, the interpreter might appear remotely, while the interviewer and suspect are together; at other times, the interpreter and the interviewee might be together, separate from the interviewer. For example, if a minority group member is in custody or is outside of the jurisdiction, seeking asylum in a migration proceeding, this person might attend the legal proceedings remotely via video, and the interpreter and the interviewer might be co-located elsewhere (Licoppe et al., 2018). Moreover, the impact of matching the ethnic background of the interpreter to that of the suspect or that of the interrogator has not been tested either in a remote location or in person.

The expansion and advancement of videoconferencing technology and videolink capabilities via the internet, and their widespread availability, has generated a series of new questions surrounding the effectiveness of remote interpretingservices. The topic of remote interpreting in police interviews remains vastly under-researched, and rigorous studies are needed.

Interviewing Strategy Maintenance in Interpreted Police Interviews

Given the importance of specialized interviewing strategies in investigative interviews and the prevalence of police interviews with non-English-speaking witnesses and suspects, one would expect more research attention to the impact of interpreters on widely used questioning strategies, such as open-ended questions seeking narrative responses, the Cognitive Interview, and rapport building. To date, few experimental studies of these topics have been undertaken.

In general, police interviewers have left the choice of interpreting mode to the interpreters (Böser, 2013). However, interpreting scholars have cautioned that the choice of interpreting mode affects the way questions and answers are reproduced and received by interviewees (Jacobsen, 2012). The mode of interpreting used most frequently in psychological investigative interviewing research to date is the consecutive mode. The long consecutive mode is the sole mode tested in interpreted deception detection studies (Ewens, Vrij, Leal, et al., 2016a, 2016b, Ewens, Vrij, Mann, & Leal, 2016, Ewens et al., 2017; Vrij et al., 2017, Vrij, Leal, Fisher, et al., 2018, Vrij, Leal, Mann, et al., 2018; Vrij & Leal, 2020; Vrij & Leal, 2020). Yet this interpreting mode is widely acknowledged as least effective to replicate features central to interviewer rapport-building strategies, such as paraverbal discourse markers, and verbal repetitions. These features also contribute significantly to appropriate identification of the interviewer’s and interviewee’s meaning (Jacobsen, 2012).

In a live, simulated field study, the impact of an interpreter on responses elicited in the course of the information-gathering interview used by police interviewers in Scotland was tested (Böser, 2013). This interview model has six phases (Planning, Preparation, Rapport building, Information gathering, Clarifying, and Evaluation) and is very similar to the model used by police interviewers in the UK. The interpreters used the consecutive interpreting mode. The researcher observed that this interpreting mode led the police interviewer to shorten his questions: no question asked in any of the six interviews exceeded a 10-s duration. Of particular interest were the critical free recall narrative responses elicited by open-ended questions. Interpreters using the consecutive mode interrupt the interviewee and fragment the narrative into either short or long turns. Once interviewees commenced longer turns to provide narrative responses to open-ended questions, both witnesses and interpreters experienced disruption and coordination difficulties. Negative consequences were interpreter summarizations that changed the evidence and the weight of the evidence reported by interviewees, as well as rapport inhibition. Böser (2013) concluded that the consecutive mode was problematic both in terms of its quality (e.g., loss of propositional content) and in terms of rapport (e.g., switching from first-person to third-person footing).

Researchers have acknowledged that the problematic “interactional quandary is a general feature of consecutively interpreted question/answer sequences” whenever turn taking between longer and shorter responses must be managed (Licoppe et al., 2018, p. 300). Some interpreters using the consecutive mode actively intervene when an interviewee elaborates and engages in narrative expansions in response to yes/no questions (Wadensjö, 2010). Interpreter interventions in response to longer, narrative interviewee responses to open-ended questions were examined in a comparative field study of interpreted asylum hearings (Licoppe et al., 2018). Both interpreters and speakers relied extensively on visual and verbal cues to stop speaking for interpretation, to continue the narrative, and to give the other speaker a turn. In general, the sequential chunking of responses to open-ended questions made it difficult to identify transition points and the relevance of responses and created an opportunity for the other speaker to intercede before the narrative response concluded.

Another interviewing strategy asks interviewees for open-ended responses by reporting events chronologically and then in reverse order. This Cognitive Interview strategy was applied in a laboratory experiment testing deception detection theories in interpreted and monolingual interviews (Ewens, Vrij, Mann, & Leal, 2016). The long form of the consecutive mode of interpreting was tested. Using this interpreting mode, this interview strategy was reported to be effective in eliciting more cues to deception in interpreted but not monolingual interviews. However, this outcome must be tempered in light of limitations of the research procedures.

Despite obvious tensions between interviewing strategies that seek an open-ended narrative response and the use of an interpreting mode that interrupts the narrative (the sequential short and long forms of the consecutive mode of interpreting), to date, no research has examined the effectiveness of the simultaneous interpreting mode with open-ended narrative responses.

Rapport Building in Interpreted Police Interviews

The impact of interpreters on verbal and nonverbal rapport building strategies has been assessed in some studies of interpreted police interviews. Next, we discuss research that has examined interpreter maintenance or inhibition of these strategies.

Verbal Markers of Rapport

Some evidence for the effect of interpreter-mediated communication on rapport in a police interview comes from a homicide case study in which Russian sailors were questioned about a murder that took place on a docked ship in the UK (Krouglov, 1999). Research using discourse analysis revealed that the interpreter edited or deleted witness utterances which were important for rapport building. For example, comparisons of the interviewer’s questions and the translated transcription showed colloquialisms, linguistic hedges, and diminutives were deleted or changed. Colloquialisms and linguistic hedges could provide evidence of pragmatic intention while diminutives could be used in order to appear responsive and facilitate rapport. Furthermore, the addition of particles, polite forms, and stylistic shifts in the interpreted statement meant that the witnesses were inaccurately represented to the police interviewer.

In other studies, unintended changes by interpreters included omissions and additions of discourse markers (Hale, 1999), powerless features (Hale, 2002; Mizuno et al., 2013), additions of politeness markers, or modifications of verbal strategies that the legal practitioners so carefully crafted to achieve a specific purpose (Hale, 2010). Similarly, interpreters might omit or distort profane language used by the police officer (Ainsworth, 2016) or witness (Felberg, 2016; Felberg & Šarić, 2017; Hale, Goodman-Delahunty, Martschuk, and Doherty, 2020b). Such changes might, however, result in a more formal statement, and the interviewer will not get a chance to respond to the emotionally laden expressions of the witness.

This was demonstrated in a 30-min live, simulated interpreted interview by an English-speaking police interviewer of an Arabic-, Mandarin-, or Spanish-speaking suspect who used profane language on two occasions, conveying anger and frustration that was not directed at anyone (Hale, Martschuk, Goodman-Delahunty, Taibi, and Han, 2020). To provide an appropriate rendition requires bidirectional bilingual competence, first, to understand the intent of the profane source utterance, and second, to provide a pragmatic equivalent in the target language. In this field experiment, analyses revealed that more experienced interpreters and those with more legal interpretingtraining maintained profane language to a higher extent than their less experienced counterparts. The majority of Spanish-speaking interpreters maintained profane language or provided a softer illocutionary force (intent or meaning) than the suspect conveyed; the majority of Mandarin-speaking interpreters omitted profane language in the first half of the interview and provided a pragmatic rendition in the second half of the interview; while half the Arabic-speaking interpreters omitted profane language in the first half of the interview and used a semantic or pragmatic rendition in the second half of the interview. In some situations, cultural factors might lead to semantic interpretations of profane expressions. Many Spanish-speaking interpreters were endogroup members, native English speakers whose competence in English exceeded that of the Arabic- and Mandarin-speaking interpreters. This factor might have contributed to the differences between the language groups.

One way that interpreters can facilitate the spontaneity of exchanges in speaker participation to build interviewer–interviewee rapport is by keeping question forms intact. In some laboratory experiments testing deception detection theories, the interviewers asked scripted open-ended questions designed to elicit lengthy narrative responses. Interviewees’ post-interview ratings of interviewer–interviewee rapport showed no differences between rapport ratings in interpreted and monolingual groups (Ewens, Vrij, Leal, et al., 2016b; Houston et al., 2017).

Nonverbal Markers of Rapport

An early case study by Lang (1976) of a court case in Papua New Guinea provided some evidence that the interpreter’s gaze affects turn taking, and thus rapport. Lang found that gaze was the most important indication of attention and turn taking in legal conversation. In addition, when the interpreters averted their gaze, they missed other important turn-taking cues. In the laboratory study by Houston et al. (2017), when interpreters were placed behind the interviewee, thereby blocking visual communication, post-interview rapport ratings by interviewees were lower than when the interpreter and interviewee could communicate nonverbally.

Briefing Interpreters on Rapport-Building Strategies in Police Interviews

Because interpreters are often unaware of investigative interviewing strategies, they might benefit from informative guidance defining rapport and outlining different rapport-building strategies applied by investigative interviewers, as was developed by Dhami et al. (2017). A rapport-building information sheet was administered to half of the participants (undergraduate students), before they read a series of vignettes describing police interviews of foreign suspects who were speaking a language different from that of the interviewers. Participants who read the rapport-information sheet were better able to identify the level of rapport between the interviewer and suspect than the control group.

In a subsequent experimental field study of the maintenance of rapport features in an interpreted simulated police interview lasting about 25 min, the updated rapport information guide was administered to Spanish-speaking trained interpreters and untrained bilinguals before they commenced an interpreting task, while the same number of participants undertook the same interpreting task without reviewing the guide (Goodman-Delahunty et al., 2020). Overall, trained interpreters were more likely to replicate verbal and nonverbal rapport markers found in the original speech than were untrained bilinguals. Furthermore, while trained interpreters tended to replicate the strategies of the speakers that facilitated rapport, the untrained bilinguals engaged in more behaviors that inhibited rapport between the interviewer and the suspect. Compared to interpreter groups who were not exposed to this information, the written guidance about rapport provided to interpreters before the simulated interview increased the maintenance of verbal rapport features by trained interpreters and untrained bilinguals, and decreased rapport inhibition. Ad hoc bilinguals who received the information guide increased the extent to which they replicated the interviewer’s rapport strategies as the interview proceeded, whereas the trained interpreters who read the guide performed consistently well at this task throughout the interview.

Credibility Assessments of Witnesses and Suspects in Interpreted Police Interviews

Witnesses and defendants who require interpretingservices in legal settings are primarily migrants and members of communities with the status of minorities (Monteoliva-Garcia, 2018, p. 48). This raises the issue of the credibility of witnesses and suspects who are members of social outgroups. A range of factors has been shown to influence the perceived credibility of both English-speaking members of outgroups, such as minorities, and non-English-speaking outgroup members whose evidence is translated by interpreters. Assessments of witness voluntariness and credibility are key factors for any witness or suspect who attends a police interview, and these issues are heightened for non-English-speaking interviewees. Next, we review research findings on (a) the perceived credibility of English-speaking outgroup members, (b) the influence of interpreting performance on perceived witness credibility, (c) the influence of interpreting mode on the perceived credibility of non-English-speaking suspects, and (d) veracity assessments in interpreted police interviews.

The Perceived Credibility of English-Speaking Outgroup Members

Psychological research has demonstrated that the credibility of members of minority groups can be disadvantaged in comparison with that of their majority counterparts, even when the minority witnesses and suspects speak the same language as their interrogators, and the police interview is conducted in English (Villalobos & Davis, 2016). This bias can arise for various reasons. One reason is cultural differences in communicative norms, both in verbal and nonverbal communication patterns. Examples of verbal cultural differences include instances of gratuitous concurrence, namely expressions of agreement with authoritative, powerful outgroup members (Villalobos & Davis, 2016). This is common in the responses of minority community members such as Indigenous Australians (Eades, 2015), regardless of their understanding of what was said. Examples of nonverbal communication patterns are gaze aversion (Vrij & Winkel, 1991, 1994) and instances of long pauses in conversation. In response to police questions, these nonverbal behaviors can have negative implications for minority suspects, despite the fact that they are not associated with deception (see meta-analyses by DePaulo et al., 2003, and Sporer & Schwandt, 2007). Typically, long pauses in conversation in Standard English are perceived as a sign of deception (Sporer & Schwandt, 2006; Vines, 2005) but are normative in Aboriginal dialects, creating negative impressions of these speakers in legal settings (Eades, 2007). In one laboratory experiment using a simulated police interview, the insertion of long pauses in response to an interviewer’s questions increased ratings of guilt by observers, irrespective of whether the indigenous suspect had a clearly Aboriginal appearance (Devaraj & Goodman-Delahunty, 2009).

Another reason that minorities might be rated less credible than their majority counterparts is the influence of Stereotype Threat. Minority individuals might become more concerned about, and aware of, their own actions and about being judged and treated according to a negative stereotype about their group (Steele & Aronson, 1995). This in turn affects their emotions, cognitions, and behaviors. Self-awareness and experiences of nervousness or anxiety by minority individuals can, in turn, increase the perception of deception of and by minority suspects (Fenn, Grosz, & Blandon-Gitlin, 2020; Villalobos & Davis, 2016).

The Influence of Interpreting Performance on Perceived Witness Credibility

When the evidence provided by a witness or suspect is given in a language other than English, assessments of credibility and detection of deception are based primarily on the interpreter’s rendition. The listener in a legal setting is typically looking for possible cues that bear on witness credibility, thus a correctly interpreted version is essential. If an interpreter makes factual errors and the content is attributed to the original utterance, this might lead to an unwarranted lack of credibility or perception of deception. Conversely, when an interpreter is trying to minimize contradictions (consciously or unconsciously), the interpreted version might appear coherent, while the original utterance was not.

The influence of interpreting accuracy on witness credibility has been investigated through discourse analytical studies of authentic and simulated trials and police interviews. Interpreters often fail to reproduce seemingly superfluous noncontent features that might affect judgments of veracity, such as hesitations, fillers, hedges, and repetitions (referred to as powerless features) (Berk-Seligson, 1990/2002; Dueñas González, Vásquez, & Mikkelson, 1991; Hale, 2010). Prior monolingual research has indicated that features of powerless speech are prominent in police interviews (Ainsworth, 1993). Research into the impact of the style of speech on mock jurors showed that what is known to be a powerful speech style enhances evaluations of witness credibility compared to the powerless speech style (Conley et al., 2019). Studies of interpreted powerful and powerless communication styles yielded the same results (Berk-Seligson, 1990/2002; Hale, 2010).

A common inaccuracy is the omission or change of linguistic stylistic features that can affect jurorperceptions of the source language speaker, such as register, pragmatic force, or levels of politeness. Quasi-experimental studies using oral recordings of interpreted evidence have shown that when interpreters unwittingly “improved” on the style of the original, by making it more coherent and omitting powerless features or by adding politeness markers, evaluations of witness credibility were significantly enhanced; the opposite was the case when interpreters added their own powerless features (Berk-Seligson, 1990/2002; Hale, 2010).

Small-scale Japanese laboratory studies of interpreted testimony excerpts disclosed that lexical choices by the interpreter had the power to shift the perceived guilt of a suspect (Mizuno et al., 2013). Further laboratory research confirmed that a court interpreter who used a marked (distinctive) versus unmarked (common) expression for a certain concept influenced inferences drawn by mock jurors (Mizuno & Acar, 2012). Despite findings indicating that witness credibility assessments were modified by specific actions or omissions by interpreters, there is a dearth of research in more ecologically valid settings on the effects of interpreted testimony on credibility assessments, using large samples. In addition, credibility has rarely been assessed using psychometrically validated credibility scales, such as the 18-item Observed Witness Efficacy Scale (Cramer, DeCoster, Neal, & Brodsky, 2013), which includes verbal and nonverbal indicators, or the Witness Credibility Scale (Brodsky et al., 2010; Cronbach’s α = .95), a semantic differential scale that consists of 20 paired adjectives rated on a 10-point Likert-type scale (e.g., 1 = ill-mannered to 10 = well-mannered; 1 = dishonest to 10 = honest). The resulting four factor scores (Likeability, Confidence, Trustworthiness, and Knowledgeability) have been applied to interpreters (Hale et al., 2018).

The Influence of Interpreting Mode on the Perceived Credibility of Non-English-Speaking Suspects

Among factors found to have an impact on the perceived credibility of non-English speakers is the interpreting mode, that is, simultaneous versus consecutive interpreting. A field experiment assessed whether interpreting mode influenced perceptions of the credibility of the accused in a criminal trial. The Spanish-speaking accused testified either in English (monolingual trial) or in Spanish via an interpreter who interpreted simultaneously using interpreting equipment (simultaneous mode) or consecutively from a position adjacent to the accused (consecutive mode) (Hale et al., 2017). Analyses showed that the perceptions of the credibility of the accused in the simultaneous interpreting mode matched those in the monolingual trial, while the accused’s apparent credibility was elevated and enhanced by consecutively interpreted testimony. In other words, mock jurors perceived the accused’s evidence in the consecutive mode as more consistent, reliable, and credible than the same evidence provided in the monolingual trial and in the simultaneous interpreting mode. The interpretation was scripted to ensure the content was identical in both modes, with the same interpreter in both modes. The only differences were the mode and the position of the interpreter. At the same time, mock jurors reported they were more distracted in the consecutively interpreted trial than in the other two trials (Hale et al., 2017).

Whether credibility assessments of interviewees in consecutively interpreted police interviews will be similarly enhanced has yet to be tested in studies in which the ground truth of the witness statements is known. Future research should manipulate the ground truth of the witness testimony so that the impact of interpreting mode on credibility can be discerned.

Veracity Assessments in Interpreted Police Interviews

As was noted above, most laboratory experiments on interpreted interviews have tested deception detection theories. Much of the focus in this line of research has been on verbal communication because meta-analyses revealed that nonverbal and paraverbal cues to deception were less diagnostic (DePaulo et al., 2003; Sporer & Schwandt, 2006, 2007). The most common verbal cue to deception is a higher proportion of details reported by truth-tellers than by liars. As Evans et al. (2020) observed, the results of interpreted interviews have at times shown that the presence of verbal cues was facilitated in interpreted interviews (e.g., Leins et al., 2017), and at other times that verbal cues were inhibited (e.g., Ewens, Vrij, Leal, et al., 2016a; Vrij, Leal, Mann, et al., 2018). In monolingual studies, cross-cultural differences emerged in the extent to which details were provided (Anakwah, Horselenberg, Hope, Amankwah-Poku, & van Koppen, 2020; Leal et al., 2018; Taylor, Larner, Conchie, & Menacere, 2017). Further research is needed to examine the issues of cross-cultural factors on verbal reports in interpreted police interviews and also to examine nonverbal and paraverbal communication features. For example, nonverbal “freezing” (inhibition of body movements) did not emerge as a reliable cue to deception in cross-cultural research (van der Zee et al., 2019).

Synopsis on Contemporary Research

To date, contemporary research on interpreted police interviews has focused on several attributes of the interpreting process. Results highlighted the importance of objective measures of simultaneous and consecutive interpreting modes. These revealed several potential advantages in police interviews of the simultaneous interpreting mode facilitated by visual communication, such as more efficiency, accuracy, and faithful replication of rapport-building strategies. Attention to interpreter placement is important in an interview to convey impartiality and facilitate visual access to both speakers. Visual access emerged as a key determinant in remote interpreted police interviews, accounting for the greater effectiveness of videolink over telephone interpreting, although more research is needed on best practices in videolink interview configurations. Briefing interpreters about specific rapport-building interviewing strategies assisted them in replicating these features, especially interpreters with less experience. To date, little research has been conducted to assess the impact of interpreted communications in a police interview on the credibility of non-English-speaking suspects and witnesses. This is an important topic of research, as it is the point where veracity assessment, cross-cultural differences, and features of the interpreting task intersect.

Conclusions

The study of interpreting has been described as interdisciplinary and multifaceted (Pöchhacker, 2015). Legal interpreting is a field in which practitioners and interpreting scholars from a variety of disciplines collaborate, such as Law, Linguistics, Pragmatics, and Cognitive Psychology (Monteoliva-Garcia, 2018). While applied forensic linguists and anthropologists have been conducting research on these topics for decades, legal and forensic psychologists are relative newcomers to this endeavor.

As in the case of monolingual police interviews (Kebell & Davies, 2006; Madon et al., 2019), this review identified the need for more theory and testing of models of interaction in interpreted interviews. A contribution of this chapter was its emphasis on multimodal communication theory to synthesize the disparate research outcomes.

The research groups conducting contemporary studies of interpreted police interviews often belong to different disciplines, each of which has different research conventions and preferred methodologies. Most of their research reports are published in journals within their own disciplines, making them less accessible to researchers from other disciplines who are working on the same issues or problems. Entrenched research silos and methodologies pose challenges in comparing research outcomes of prior studies on interpreted investigative interviews when the same research questions are addressed.

Prominent examples where more consensus is needed are assessments of modes of interpreting and of interpreting accuracy, both of which are critical determinants of effectiveness in an interpreted police interview. Adoption of more standard measures of accuracy of interpreting will be helpful in determining the source of observed outcome disparities and in resolving issues about the best practice for interpreting mode in legal settings. These standards should include multiple convergent measures of accuracy that take into account errors, additions, and omissions of core verbal propositional content, as well as paraverbal and nonverbal communication components.

Overall, the research conducted to date has focused mostly on the perspectives of interviewers and has not canvassed perspectives of all stakeholders regarding interpreted interviews. In particular, perspectives of suspects, victims, witnesses, and other sources are unrepresented or under-represented in the literature. Studies of interpreted police interviews with persons other than suspects are recommended (Evans et al., 2020).

A singular factor contributing to flaws in the extant research is the lack of cross-disciplinary collaboration and collaboration between researchers and interpreting practitioners. Collaborative field studies and field experiments by police interviewers, interpreting researchers and practitioners, and psychologists have yielded more robust outcomes than laboratory experiments by psychologists. Scholars working at the interface between law, linguistics, and legal and forensic psychology have noted that the weaknesses of research conducted in one discipline alone can be cured by interdisciplinary collaboration (Conley et al., 2019; Kebell & Davies, 2006). To this end, we advocate more extensive transdisciplinary research collaboration between researchers with expertise in Interpreting, Law, Linguistics, Policing, and Legal and Forensic Psychology. A strength of this transdisciplinary model is the complementary skills applied to resolution of a common problem. Ideally, members of a transdisciplinary team come together from the beginning to jointly communicate, exchange ideas, and work together to generate solutions—that is, efforts in determining best ideas or approaches are collective.

Few perspectives from consumers or end-users of the interpreted interviews have yet been obtained, yet several implications flow from the foregoing research review for other contexts. Next, we review implications for three groups of potential end-users of interpreted police interviews: (a) legal practitioners who from time to time represent persons with limited or no English-speaking abilities; (b) legal professionals who work in settings where interpreted proceedings are routine, such as asylum and migrancy proceedings; and (c) judges and juries who might review videotaped interpreted police interviews, or read transcripts of interpreted interviews in order to make credibility determinations that bear on verdicts in cases where witnesses or suspects require an interpreter.

Implications for courts of findings on the most effective mode of legal interpreting will be extensive. For example, if courts were to adopt proceedings in the simultaneous mode, all legal settings would require appropriate technological equipment. Institutions providing interpreting training for legal interpreters, as well as certification and accreditation bodies, would need to adapt their practices to ensure that legal interpreters were trained and proficient in this mode in place of the consecutive interpreting mode.

One by-product of research on interpretingaccuracy is that methods of assessment applied in some studies might prove useful in legal disputes over the accuracy and integrity of legal interpreting in litigated cases. Future researchers might wish to explore innovative methods to crosscheck interpreting accuracy in legal settings, such as police interviews, tribunals, asylum and migrancy proceedings, and courts.

Few studies to date have tested the effectiveness of training interventions for police practitioners about interpreting components and of training interventions for interpreters about interviewing components and strategies, minimization of unconscious cognitive biases, etc. The development and testing of interpreter training to focus on contextual rather than cultural differences might be helpful.

Interviewer training programs should routinely include information about the nature of interpreting (ImPLI Project, 2012), different modes of interpreting, and ways to manage interpreter-assisted interviews. Additionally, practitioners should be advised of any organization-specific requirements of interpreters, which might supplement the professional ethical code, and about which they would need to brief interpreters (Goodman-Delahunty & Howes, 2017).

In sum, interpreting in police interviews is a critical legal topic, as errors and failures at this juncture in the criminal legal process can be far-reaching and can result in serious injustices, even wrongful convictions or acquittals. To date, interpreting practices have been implemented in the absence of a sound evidence base to justify their use. These include fundamental practical factors, such as the mode of interpreting applied in investigative interviews, the placement of the interpreter in legal settings, and the implementation of remote interpreting due to the increasing reliance on videolink technology in legal proceedings. This approach is at odds with new mandates for evidence-based policing. Policies and practices implemented in interpreted police interviews should be informed by convergent findings derived from research conducted by diverse transdisciplinary teams using a variety of qualitative and quantitative research methods.