Keywords

1 Introduction

Since the 1960s, researchers have had a methodological tool at their disposal unlike any other: the audit study.Footnote 1 The audit study is a specific type of field experiment that permits researchers to examine difficult to detect behavior, such as racial and gender discrimination , and decision-making in real-world scenarios. Audit studies allow researchers to make strong causal claims and explore questions that are often difficult or impossible to answer with observational data. This type of field experiment has exploded in popularity in recent years, particularly to examine different types of discrimination, due to the rise of online applications for housing and employment and easy access to decision makers across many contexts via email .

However, the learning curve for designing and implementing these experiments can be quite steep, despite appearing to be a simple and quick method for examining discrimination. Thus, we have written this book to help scholars design, conduct, and analyze their own audits. This book draws upon the knowledge of a variety of social scientists and other experts who combined have implemented dozens of in-person and correspondence audits to examine a variety of research questions. These experienced scholars share insights from both their successes and failures and invite you, the reader, “behind the scenes” to examine how you might construct your own audit study and improve upon this method in the future. We write this book with a wide audience in mind and hope that you will find this book useful whether you have already fielded your own audit study, are just thinking about how you might design an audit study, or just want to learn more about the method to better understand research using audits.

In this introductory chapter, I approach the subject as one might with a lay audience. However, even experienced researchers with in-depth knowledge of the audit method should find this chapter useful. I mostly focus on the aspects of audit studies related to research rather than those related to activism or law and policy .Footnote 2 I begin this chapter with the basics – a discussion of the language and definitions related to audit studies. Significant differences in language persist between studies, researchers, and disciplines, and I hope that this part will help readers understand these differences as well as encourage researchers to adopt a common language. Next, I give a succinct overview of why researchers began using audits to examine discrimination. The audit method is a powerful tool to answer certain types of questions and I attempt to outline when researchers can and should use this method. I then give an overview of the history of audit studies. Although others have written superb reviews of this body of literature in the past (Baert, Chap. 3 of this volume; Oh and Yinger 2015; Riach and Rich 2002; Zschirnt and Ruedin 2016), I focus on the forest rather than the trees in this part and provide a narrative of the arc of audit studies over time .Footnote 3 Finally, I close this chapter with a succinct discussion of the limitations of correspondence audits and thoughts on how we might improve this method, which complements the closing chapter of this book (Pedulla, Chap. 9 of this volume).

Readers looking for additional information on audit studies should consult two resources. First, we have created a website – www.auditstudies.com – to go along with the release of this volume. There you will find a comprehensive database of audits, information about subscribing to an audit method listserv, as well as additional information. Second, at the end of this chapter I provide a brief recommended reading list of important comprehensive works, reviews, and other methods-based articles and books.

Beyond this introductory chapter, several accomplished scholars present their expert knowledge about audit studies. In the first part – The Theory Behind and History of Audit Studies – the authors cover a wide range of history, explain why we should conduct audit studies, examine the connections between audit studies and activism, and outline what researchers have uncovered about labor market processes using audit studies in the past decade. In the second part – The Method of Audit Studies: Design, Implementation, and Analysis  – the experts provide guidance on designing your own audit study, discuss the challenges and best practices regarding email , review extensive issues of validity, and consider the technical setup of matching procedures. In the final part – Nuance in Audit Studies: Context, Mechanisms , and the Future – the authors focus on more nuanced aspects of audit studies and address limitations and challenges, examine the use of context to explore mechanisms, and consider the value of variation. I return to a brief discussion of the rest of this book at the end of this chapter.

2 The Basics of Audit Studies: Language and Definitions

Field experiments encompass a wide range of studies and ideas and describe the highest level of the hierarchy I focus on here. Audit studies are one type of field experiment. At their core, field experiments in the social sciences attempt to mimic the experiments of the natural sciences by implementing a randomized research design in a field setting (as opposed to a lab or survey setting). Although many may think of psychology as the disciplinary home to social science experiments, researchers in economics, political science, and sociology have ramped up the quantity and quality of field experiments conducted in these disciplines over the past few decades. Although not the only reason for the increase in field experiments across these disciplines, audit studies do represent a major part of the heightened activity.

Audit studies generally refer to a specific type of field experiment in which a researcher randomizes one or more characteristics about individuals (real or hypothetical) and sends these individuals out into the field to test the effect of those characteristics on some outcome. Historically, audit studies have focused on race and ethnicity (Daniel 1968; Bertrand and Mullainathan 2004; Wienk et al. 1979) and gender (Ayres and Siegelman 1995; Levinson 1975; Neumark et al. 1996). In recent years, researchers have expanded the manipulated characteristics to include age (Ahmed et al. 2012; Bendick et al. 1997; Farber et al. 2017; Lahey 2008; Neumark et al. 2016; Riach 2015; Riach and Rich 2010), criminal record (Baert and Verhofstadt 2015; Evans 2016; Evans and Porter 2015; Furst and Evans 2016; Pager 2003), disability (Ameri et al. forthcoming; Baert 2014a; Ravaud et al. 1992; Turner et al. 2005; Verhaeghe et al. 2016), educational credentials (Carbonaro and Schwarz, Chap. 7 of this volume; Darolia et al. 2015; Deming et al. 2016; Deterding and Pedulla 2016; Gaddis 2015, 2017e; Jackson 2009), immigrant assimilation or generational status (Gell-Redman et al. 2017; Ghoshal and Gaddis 2015; Hanson and Santas 2014), mental health (Baert et al. 2016a), military service (Baert and Balcaen 2013; Figinski 2017; Kleykamp 2009), parental status (Bygren et al. 2017; Correll et al. 2007; Petit 2007), physical appearance (Bóo et al. 2013; Galarza and Yamada 2014; Maurer-Fazio and Lei 2015; Patacchini et al. 2015; Ruffle and Shtudiner 2015; Stone and Wright 2013), religious affiliation (Adida et al. 2010; Pierné 2013; Wallace et al. 2014; Wright et al. 2013), sexual orientation (Ahmed et al. 2013; Baert 2014b; Bailey et al. 2013; Drydakis 2009, 2011a, 2014; Mishel 2016; Tilcsik 2011; Weichselbaumer 2015), social class (Heylen and Van den Broeck 2016; Rivera and Tilcsik 2016), and spells of unemployment and part-time employment (Birkelund et al. 2017; Eriksson and Rooth 2014; Kroft et al. 2013; Pedulla 2016), among other characteristics (Baert and Omey 2015; Drydakis 2010; Kugelmass 2016; Tunstall et al. 2014; Weichselbaumer 2016).

The “individuals” sent into the field may be actual people in an in-person audit or simply applicants or emails from hypothetical people in correspondence audits (more below). The outcomes may be an offer to interview for a job (Bertrand and Mullainathan 2004; Darolia et al. 2015; Deming et al. 2016; Gaddis 2015), a job offer (Bendick et al. 1994, 2010; Pager et al. 2009a, b; Turner et al. 1991a), the order in which applicants are contacted (Duguet et al. 2015), a response to a housing inquiry (Ahmed and Hammarstedt 2008; Bengtsson et al. 2012; Carlsson and Ericksson 2014; Carpusor and Loges 2006; Ewens et al. 2014; Feldman and Weseley 2013; Hogan and Berry 2011; Van der Bracht et al. 2015), the types of housing shown (Galster 1990a; Turner et al. 2002, 2013), information about the availability of a house for purchase or rent (Galster 1990b, Turner et al. 2002, 2013; Yinger 1986), an offer of different housing than requested or racial steering (Galster and Godfrey 2005; Turner et al. 1990), a response to a mortgage application or request for information (Hanson et al. 2016; Smith and Cloud 1996; Smith and DeLair 1999), a response to a roommate request (Gaddis and Ghoshal 2015, 2017; Ghoshal and Gaddis 2015), an offer to schedule a doctor’s appointment (Kugelmass 2016; Sharma et al. 2015), a response from a politician or other public official (Broockman 2013; Butler and Broockman 2011; Chen et al. 2016; Distelhorst and Hou 2014; Einstein and Glick 2017; Hemker and Rink forthcoming; Janusz and Lajevardi 2016; McClendon 2016; Mendez and Grose 2014; White et al. 2015), a response from a professor (Milkman et al. 2012, 2015; Zhao and Biernat 2017), the price paid or bargained for during economic transactions for goods (Anagol et al. 2017; Ayres 1991; Ayres and Siegelman 1995; Besbris et al. 2015; Doleac and Stein 2013), or a number of other outcomes (Allred et al. 2017; Edelman et al. 2017; Giulietti et al. 2015; Ridley et al. 1989; Wallace et al. 2012; Wissoker et al. 1998; Wright et al. 2015).

Two main variations of audits exist: in-person audits and correspondence audits . In-person audits rely on trained assistants to conduct the experiment. Early audit studies almost exclusively referred to the research subjects posing as legitimate applicants for employment or housing as testers or auditors. This is due, in part, to the fact that the language for such research was adopted from early testing for legal violations for enforcement rather than research purposes (see Boggs et al. 1993 and Fix and Turner 1999 for an in-depth discussion of differences between paired testing for enforcement purposes versus research). However, as correspondence audits overtook in-person audits as the norm and real individuals posing as subjects were not required, researchers shifted their language to refer to applicants, candidates, constituents, prospective tenants, etc. In other words, the language should match what the audit context dictates. Although the language identifying testers , auditors, or applicants may vary due to the nature of the study, we recommend that researchers adopt a common language of “in-person audits ” to identify field cases using live human beings and “correspondence audits ” to identify online, telephone, or by mail audits using hypothetical individuals or recorded messages in the case of some audits by telephone.

Although most audit studies include paired (or sometimes triplet) testing with comparisons of two (or three) testers or applicants, not all do (for example, see Hipes et al. 2016; Lauster and Easterbrook 2011; Rivera and Tilcsik 2016). Paired testing , also referred to as matched testing, is a design in which the subject or organization being audited (e.g., employer, real estate agent, etc.) receives applications or emails from two or more of testers with different characteristics. Conversely, non-paired testing is a design in which the subject or organization being audited only ever receives a single tester application or email . For example, a paired test design might send both a black couple and a white couple to each real estate agent’s office in the sample whereas a non-paired test design would send only one of the two couples (randomly) to each real estate agent’s office in the sample. There can be statistical advantages to paired testing , however, in some cases it may be necessary to implement a non-paired test design to reduce suspicion and avoid experiment discovery (Vuolo et al. 2016, Chap. 6 of this volume; Weichselbaumer 2015, 2016).

3 The Need for Audit Studies

In this section, I discuss audits from the perspective of racial discrimination. However, the need for and use of audits is similar across other types of discrimination as well as some non-discrimination-based domains of inquiry.

Not coincidentally, the rise of audit studies by researchers corresponds with the public policy of the civil rights era aimed to stop racial discrimination and reduce, if not eliminate, racial inequality. Prior to the 1960s, racial discrimination in the United States occurred openly in public, was relatively common, had minimal stigma attached to it, was shaped by open prejudicial attitudes and beliefs, and arguably was informed by a conscious or active racial prejudice. Individual employers, real estate agents, and landlords could discriminate with impunity and often made public their beliefs and actions. In the United States, the Civil Rights Act of 1964 intended to change these behaviors, if not beliefs and attitudes, by outlawing discrimination on the basis of race, color, religion , sex, or national origin. The Equal Employment Opportunity Commission (EEOC) gained the ability to litigate discrimination cases following the passage of the Equal Employment Opportunity Act in 1972. Title VII of the Civil Rights Act of 1964 finally could be enforced.

However, we can imagine and, indeed do live in, a world where the Civil Rights Act may have changed the act of discrimination without changing the amount of discrimination, intentions behind discrimination, or an individual’s desire to discriminate. Although not a sharp change overnight, discrimination of all types has changed in response to the Civil Rights Act. Modern discrimination has become more covert, uncommon, and stigmatized, while being shaped by private prejudicial attitudes and beliefs, and, perhaps, informed by an unconscious or latent racial prejudice. Individuals may fear litigation for engaging in discrimination or have a social desirability bias to not acknowledge discriminatory actions. This makes it difficult for researchers to document and examine discrimination.

Thus, two traditional methods of social science inquiry are difficult, if not impossible, to employ to examine discrimination in the post-civil rights era. First, pointed interviews and survey questions asking perpetrators about racial discrimination are unlikely to elicit truthful responses. To my knowledge, the most recent research project to successfully elicit clearly truthful responses from employers about engaging in racial discrimination occurred in the late 1980s (Kirschenman and Neckerman 1991). Moreover, surveys and interviews do not document actions, but rather self-reported beliefs, attitudes, recollections of past actions, or predictions of future actions. Due to respondents’ fear and social desirability bias , and the sometimes unconscious nature of racial prejudice, direct questions about discrimination through interviews and surveys exhibit low construct validity.

Second, statistical analyses using secondary data that do not have explicit questions about discrimination also fail to adequately capture discrimination. To understand the difficulty of this process, let’s first consider a definition of discrimination. In a 2004 book stemming from the Committee on National Statistics’ Panel on Methods for Assessing Discrimination, panelists defined racial discrimination as “differential treatment on the basis of race that disadvantages a racial group” (Blank et al. 2004: 39). Although researchers can document the second (race) and third parts (disadvantage) of the definition with secondary data, directly capturing the first part (differential treatment) is impossible. Thus, secondary data analysis must use indirect residual attribution to suggest that, after including a litany of control variables that affect the dependent variable of interest on which blacks and whites differ, any remaining coefficient for race represents discrimination (Blank et al. 2004; Lucas 2008; Neumark forthcoming). However, this method is unlikely to correctly attribute the true amount of racial discrimination (Quillian 2006), due to omitted variable bias, among other issues (Altonji and Blank 1999; Blank et al. 2004; Farkas and Vicknair 1996; Lucas 2008).

Researchers developed the audit method as a means of catching individuals and organizations in the act of discrimination. Generally, experiments can be done when a presumed cause is manipulable and should be done when it is otherwise difficult to prove non-spuriousness. Many, if not all, types of discrimination are great candidates for examination through experimental means because the presumed cause often is manipulable in many contexts and, as discussed earlier, traditional methods of social science inquiry have been unable to directly document discrimination or rule out a spurious relationship. If we consider the previously stated definition of racial discrimination  – “differential treatment on the basis of race that disadvantages a racial group” (Blank et al. 2004: 39) – we see that audit studies manipulate the second part (race) to directly capture the first part (differential treatment) of the definition. Thus, by carefully controlling and counterbalancing all other variables in the experimental process, audit studies provide strong causal evidence of discrimination.

4 A History of Audit Studies

4.1 The Early Years: The First In-Person and Correspondence Audits

In-person audits began in the 1940s and 1950s by means of activists and private organizations with some assistance from academic researchers. One of the earliest media mentions of audits occurred in the New York Times in 1956 (Rowland). In Chap. 2, Frances Cherry and Marc Bendick Jr. (Chap. 2 of this volume) do an excellent job of covering some of this early work, so I leave discussion of that part of the history of audit studies to them.

The earliest known published audit study of significant scope and scale was conducted in England in the late 1960s. With the Race Relations Acts of 1965, Parliament passed the first legislation addressing racial discrimination in the United Kingdom in public domains. The following year, the U.K. Parliament created the Race Relations Board, which was tasked with reviewing complaints falling under the Race Relations Act. However, the Race Relations Act did not cover employment and housing discrimination until 1968, so in tandem with the National Committee for Commonwealth Immigrants, the Race Relations Board commissioned a study on racial discrimination in employment, housing, and other contexts. Along with surveys and interviews, the study implemented the audit method to extensively examine discrimination (Daniel 1968).

Described as “situation tests ,” the audits were born when Daniel and the research team had doubts over whether surveys and interviews would give them an accurate portrayal of the state of discrimination. Moreover, the team was unsure if the “findings would appear conclusive to those people who are strongly passionate or committed about the subject on one side or the other” (1968: 20). That doubt led them “not to depend entirely on what people told us in interviews, but to put the matter to the test in a way that would provide objective evidence” (ibid). These tests were conducted with triplets of candidates – usually white English, white immigrant, and black applicants – in the domains of housing (both rental and purchase), employment, and other services. The tests consistently uncovered discrimination against blacks and immigrants.

At the time , this commissioned study of racial discrimination was monumentally important. Along with the hard work of researcher William Wentworth Daniel, results from this study led to the revised Race Relations Act of 1968 outlawing racial discrimination in employment and housing (Smith 2015). However, this study often has been overlooked or forgotten by academics; at the time of this writing, Google Scholar reports that the resulting book by Daniel (1968) has garnered fewer than 500 citations in nearly 50 years. Still, Racial Discrimination in England’s use of the audit method in government-sponsored research marks the beginning of a series of high profile in-person audits conducted to examine racial discrimination.

Just a few years later, in 1969, the first-ever correspondence audit was conducted in the United Kingdom. Published by two researchers from the non-profit institute Social and Community Planning Research, this study sought to examine racial discrimination among employers looking to hire white-collar workers (Jowell and Prescott-Clarke 1970). The authors chose to conduct a correspondence audit through the mail because “postal applications were possible and, in many cases, necessary” to apply for employment (1970: 399). The authors matched British-born whites with four different immigrant groups to test for racial discrimination across an ambitious-for-the-time 128 job postings (256 total applicants) and noted the importance of both realism in the application and controlling for all differences between candidates including aspects such as handwriting. Again, although this study has collected few citations in nearly 50 years (fewer than 150 at the time of this writing), it remains an incredibly important entry in the annals of the audit method because it introduced the world to correspondence audits .

4.2 The First Wave: The Early 1970s Through the Mid 1980s

In the United States, a number of non-academic-based audits followed the two UK studies. Private fair housing audits rose to prominence in the late 1960s and 1970s in the United States following passage of the Civil Rights Act of 1968 (also known as the Fair Housing Act), which provided federal enforcement of anti-discrimination housing law through an office of the U.S. Department of Housing and Urban Development (HUD) . These audits were often conducted in partnership with academic researchers (often local) and often focused on one major city, such as Akron, Ohio (Saltman 1975), Chicago (as reported in Cohen and Taylor 2000), Detroit (Pearce 1979), Los Angeles (Johnson et al. 1971), and New York (as reported in Purnell 2013). Additionally, organizations often produced method-based manuals and guides for the practice of auditing (Kovar 1974; Leadership Council for Metropolitan Open Communities 1975; Murphy 1972).

However, the largest, and arguably most important, audit on housing discrimination during this era, the Housing Market Practices Survey (HMPS) , occurred in 1977 (Wienk et al. 1979). This first large-scale housing audit was commissioned by HUD to test for discrimination against blacks in both the sale and rental housing markets. HUD paired with local fair housing organizations and other organizations to recruit and train testers to conduct the in-person audits . This research included 3264 audits across 40 metro areas, with a plurality of the audits occurring in five metro areas. The HMPS found discrimination against blacks in reported housing availability, treatment by real estate agents, reported terms and conditions, and the types and levels of information requested by real estate agents. This research was critically important in leading the way for future audits, including three additional national housing audits commissioned by HUD (Turner and James 2015; Turner et al. 2002, 2013; Turner et al. 1991b; Yinger 1991, 1993), several smaller local audits (see below), and the Urban Institute employment audits a decade later (Cross et al. 1990; Mincy 1993; Turner et al. 1991a). Arguably, four aspects of the HMPS were important in shaping future audits. First, the HMPS showed that large-scale audits for discrimination in the United States were possible. Second, this research essentially gave auditing a gold seal of approval from an arm of the federal government (for more details on audits and the courts, see Boggs et al. 1993; Fix et al. 1993; Pager 2007a). Third, it was the first research to show the extent to which racial discrimination was widespread across many cities. Finally, the HMPS showed creativity in expanding the outcomes examined by audits.

Other one-off in-person and correspondence audits conducted during the 1970s and early 1980s examined housing and employment discrimination in the United Kingdom (McIntosh and Smith 1974), housing discrimination in France (Bovenkerk et al. 1979) and the United States (Feins and Bratt 1983; Galster and Constantine 1991 Footnote 4; Hansen and James 1987; James et al. 1984; Newburger 1984; Roychoudhury and Goodman 1992, 1996 Footnote 5), and employment discrimination in the United States (Hitt et al. 1982; Jolson 1974; Levinson 1975; McIntyre et al. 1980; Newman 1978), Canada (Adam 1981; Henry and Ginzberg 1985), Australia (Riach and Rich 1987, 1991), and England (Brown and Gay 1985; Firth 1981; Hubbuck and Carter 1980). Additionally, George Galster (1990a, 1990b) reviewed several fair housing audits conducted in the 1980s that were mostly unpublished and analyzed data from 71 separate audits.

During this period, researchers also began to expand the domains in which they investigated discrimination. As early as 1985, Galster and Constantine (1991) investigated housing discrimination based on parental and relationship status among women. Ayres (1991 and Ayres and Siegelman 1995) examined racial and gender discrimination in bargaining for new car prices, while Ridley et al. (1989) examined racial discrimination in hailing a taxi. Other research from this period examined discrimination based on disability (Fry 1986; Graham et al. 1990; Ravaud et al. 1992). The first wave of audits conducted in the 1970s and 1980s filled in a number of gaps in our knowledge about the extent and geography of discrimination, conditions under which discrimination occurred, and variations in outcomes that were affected by discrimination, particularly in housing and, to some degree, employment.

4.3 The Second Wave: The Late 1980s Through the Late 1990s

Beginning with the last part of the 1980s and continuing throughout the 1990s, a second wave of audits was ushered in with the second iteration of the HUD housing audit (Turner Micklensons and Edwards 1991; Yinger 1991, 1995) and a series of large-scale employment audits conducted by the Urban Institute (Cross et al. 1990; Mincy 1993; Turner et al. 1991a), in part, aided by guidelines for adapting housing audits to hiring situations (Bendick 1989). The HUD housing audit in 1989, known as the Housing Discrimination Study (HDS) 1989, was conducted in partnership with the Urban Institute. The HDS 1989 varied from and improved on the HMPS in 1977 in many ways. First, the former included Hispanic testers paired with whites for some audits to examine discrimination against Hispanics as well (Ondrich et al. 1998; Page 1995), something that was only done in an extension of the HMPS and only in Dallas (Hakken 1979). Second, in the HDS 1989 auditors focused on specific advertised housing units, whereas in the HMPS auditors approached agents about more general housing options fitting certain criteria. Thus, the HDS 1989 could more accurately examine racial steering. Third, the HDS 1989 examined fewer metro areas (25 instead of 40), but conducted more audits (3800 instead of 3264). Overall, the HDS 1989 replicated the general finding of the HMPS that housing discrimination against blacks was prevalent and widespread. However, there was no strong evidence suggesting that discrimination increased or decreased between the two data collection periods (Elmi and Mickelsons 1991).

The first of the Urban Institute employment audits was conducted in Chicago and San Diego in 1989 and examined discrimination against Hispanics (Cross et al. 1990). Researchers sampled newspaper advertisements and matched pairs successfully applied to almost 300 entry-level jobs in the two cities. The study found that Hispanics faced discrimination at both the application and interview phases, which lead to fewer interviews and fewer job offers when compared with their white counterparts. In 1990, the Urban Institute conducted a similar employment audit in Chicago and Washington, D.C. to examine discrimination against African Americans (Turner et al. 1991a). Matched pairs successfully completed nearly 450 audits in the two cities. The study found that employers discriminated against blacks in accepting their applications, inviting them to interview, and offering them a job. Black applicants were also more likely to be steered toward lower quality jobs rather than the advertised position to which they responded. Additionally, whites were treated more favorably in a number of respects, including waiting time , length of interview, and positive comments.

The Urban Institute studies were the first large-scale true employment audits conducted in the U.S. Researchers and staff went to great lengths to make the study as methodologically sound as possible and paid close attention to detail in sampling, creating matched pairs, and standardizing procedures for the audits (Mincy 1993). Although these studies provided a meticulous model for subsequent researchers to follow when conducting employment audits, others have extensively critiqued the Urban Institutes studies and the in-person audit method more broadly (Heckman 1998; Heckman and Siegelman 1993). However, by moving development and knowledge of the method forward and by providing extensive guidance (along with Bendick 1989) for the numerous employment audits that followed them, the Urban Institute audits were clearly of great importance.

Following the HDS 1989 and the Urban Institute employment audits, a wave of audits examining employment, housing, and other forms of discrimination occurred. Many audits were conducted in Europe through the International Labour Office (ILO) based on guidelines developed by Frank Bovenkerk (1992). Studies in the U.S. (Bendick et al. 1991, 1994; James and DelCastillo 1992; Nunes and Seligman 1999) and Europe (Arrijn et al. 1998; Bovenkerk et al. 1995; de Prada et al. 1996; Esmail and Everington 1993, 1997; Goldberg et al. 1995; Smeesters and Nayer 1998) focused on race and ethnic discrimination. Researchers conducted sex discrimination employment audits in the U.S. (Neumark et al. 1996; Nunes and Seligman 2000) and Europe (Weichselbaumer 2000), as well as age and disability-based discrimination employment audits in the U.S. (Bendick et al. 1999) and Europe (Graham et al. 1990; Gras et al. 1996). This period also included the continuation of telephone-based (Bendick et al. 1999; Massey and Lundy 2001; Purnell et al. 1999) and written correspondence audits (Bendick et al. 1997; Gras et al. 1996; Weichselbaumer 2000). Still, the cost-prohibitive nature of in-person audits and labor-intensive nature of correspondence audits during the 1990s meant that use of the audit method was relatively rare.

4.4 The Third Wave: The Early 2000s Through the Late 2000s

Until the early 2000s, most audits were conducted in-person and relied on trained assistants to physically participate in the process. With housing and employment applications increasingly taking place over the internet, researchers began conducting more correspondence audits . However, some important audits in the early 2000s were still in-person, including the second iteration of HUD and the Urban Institute’s Housing Discrimination Study (HDS 2000: Bavan 2007; Ross and Turner 2005; Turner et al. 2002). Devah Pager was the first to examine the effects of a criminal record using an audit study (2003) and produced an incredibly strong body of work during this period consisting of in-person audits as well as examinations of the method (Pager 2007a, b; Pager et al. 2009a, b; Pager and Quillian 2005; Pager and Shepherd 2008).

The 2000s brought about significant changes in the audit method and the importance of this era is highlighted by the fact that the two most cited audit studies of all time both occurred in the early 2000s. Devah Pager’s (2003) in-person audit study of race and criminal record in the low-wage labor market in Milwaukee has garnered over 2000 citations according to Google Scholar. Marianne Bertrand and Sendhil Mullainathan’s (2004) correspondence audit study of race in labor markets in Boston and Chicago has over 3100 citations at the time of this writing. Both studies have been incredibly important in shaping our understanding of racial discrimination , however, the differences between them are stark and mark a major turning point in the history of audit studies.

Bertrand and Mullainathan’s 2004 study, published in The American Economic Review, is the most influential correspondence audit study of the past two decades. In total, the authors applied to over 1300 job advertisements, compared to Pager’s 350 jobs (2003), listed in newspapers in Boston and Chicago via fax and mail. Additionally, the authors used birth record data and a small convenience sample pretest to select names to convey race on each resume. Rather than send two applicants per job, the authors often used four resumes to examine both race and resume quality simultaneously and obtained a final sample size of 4870. Bertrand and Mullainathan found that white applicants were about 50% more likely than black applicants to receive a callback. Moreover, black applicants benefited less than white applicants from higher resume quality.

Bertrand and Mullainathan’s (2004) landmark study ushered in a new era of correspondence audits . Arguably, this study paved the way for the increase in audits that followed for at least three reasons. First, the research showed that a large-scale audit – in particular, a correspondence audit – could be undertaken by a small team of academic researchers, compared to past audits conducted by larger teams such as those at HUD and the Urban Institute . Although Bertrand and Mullainathan applied via fax and mail, the timing was ripe for the switch to applications over the internet which further expanded the possibilities of correspondence audits . Second, the study opened a dialogue about signaling race through correspondence audits. Because the authors conducted a small pretest and used a moderate number of names  – 36 in total – the plurality of studies that followed used the same names to signal race (see Gaddis 2017d).Footnote 6 Although over a decade would pass before scholars began to seriously question these signals (Butler and Homola 2017; Gaddis 2017a, b, c, d; Weichselbaumer 2017), Bertrand and Mullainathan were the first to truly investigate them. Finally, this study showed that it was possible to successfully manipulate several characteristics simultaneously. Beyond race and gender , the authors varied other resume characteristics such as education , experience, and skills. These manipulations likely sparked ideas among researchers about mechanisms and interactions that would follow in future studies.

The vast majority of the studies that followed Bertrand and Mullainathan during the 2000s were conducted via the correspondence method. A few notable exceptions are the previously mentioned studies by Devah Pager (2003; Pager et al. 2009a) and three studies carried out by the International Labour Office (ILO) in Italy (Allasino et al. 2004), Sweden (Attström 2007), and France (Cediey and Foroni 2008), although the ILO studies used a mix of in-person and correspondence methods. Additionally, two in-person studies examined discrimination in market transactions: baseball card sales (List 2004) and auto repair quotes (Gneezy and List 2004).

During this time , correspondence audits examining employment discrimination based on race and ethnicity expanded to cover more countries and race/ethnicities such as Albanians in Greece (Drydakis and Vlassis 2010) and Turks in Germany (Kaas and Manger 2012), and a variety of other groups in Australia (Booth et al. 2012), Canada (Oreopoulos 2011), Denmark (Hjarnø and Jensen 2008), France (Duguet et al. 2010), Great Britain (Wood et al. 2009), Ireland (McGinnity and Lunn 2011), Sweden (Bursell 2007; Carlsson 2010; Carlsson and Rooth 2007; Rooth 2010), and the U.S. (Jacquemet and Yannelis 2012; Thanasombat and Trasviña 2005; Widner and Chicoine 2011). Additionally, researchers examined employment discrimination on the basis of gender and family status in France (Petit 2007) and the U.S. (Correll et al. 2007), gender in England (Riach and Rich 2006a), Spain (Albert et al. 2011) and Sweden (Arai et al. 2016),Footnote 7 age in England (Riach and Rich 2010), France (Riach and Rich 2006b), Spain (Albert et al. 2011; Riach and Rich 2007), and the U.S. (Lahey 2008), sexual orientation in Austria (Weichselbaumer 2003), Greece (Drydakis 2009, 2011a) and the U.S. (Tilcsik 2011), race and criminal record in the U.S. (Galgano 2009), race and military status in the U.S. (Kleykamp 2009), educational credentials in the United Kingdom (Jackson 2009), caste in India (Siddique 2011), caste and religion in India (Banerjee et al. 2009), and physical attractiveness and obesity in Sweden (Rooth 2009). One additional study of note during this period is Philip Oreopoulos’ correspondence audit in Toronto, which included six different racial/ethnic/immigrant groups. He applied to over 3200 job postings using 13,000 different resumes to create one of the most ambitious correspondence audits of its time .

The expansion of audit research during the 2000s included housing discrimination studies as well. The HDS 2000 expanded to include Asians and Pacific Islanders as well as Native Americans (Turner and Ross 2003a, b) and examined housing discrimination on the basis of disability (Turner et al. 2005). Correspondence audits examined housing discrimination based on race and ethnicity in Canada (Hogan and Berry 2011), Greece (Drydakis 2011b), Italy (Baldini and Federici 2011), Spain (Bosch et al. 2010), Sweden (Ahmed et al. 2010; Ahmed and Hammarstedt 2008), and the United States (Carpusor and Loges 2006; Friedman et al. 2010; Hanson and Hawley 2011; Hanson et al. 2011). Additional research examined housing discrimination on the basis of sexual orientation (Ahmed and Hammarstedt 2008, 2009).

Beyond the major expansion of correspondence audits during this time , the period is marked by the beginning of researchers’ exploration of mechanisms of discrimination, intentions behind discrimination, and conditions under which discrimination occurs rather than simply documenting the existence of discrimination. At least four studies during this period attempted to uncover greater detail related to these issues. First, two studies followed up with employers after submitting them to an audit to examine bias in more detail. In one study, Devah Pager and Lincoln Quillian (2005) conducted a telephone survey to follow up with employers who had unknowingly participated months earlier in an in-person audit study. When given a vignette scenario that mimicked the audit scenario they were subjected to, employers suggested they would be much more likely to hire individuals than the callback rates suggested. In fact, the results of the vignette survey showed no differences between white and black applicants, suggesting the existence of social desirability bias . In another study, Dan-Olof Rooth (2010) administered the Implicit Association Test (IAT) to test whether discriminatory behavior in a prior correspondence audit was associated with IAT scores. He found a strong positive correlation between discrimination against Arab-MuslimsFootnote 8 and IAT scores but no correlation with a separate explicit measure of bias. These results could suggest that individuals are engaging in discrimination only due to implicit bias (without having a true explicit bias) or could suggest the existence of social desirability bias .

The second set of studies attempted to distinguish between statistical discrimination and taste-based discrimination . In one study, Joanna Lahey (2008) designed a computerized method of creating resumes to examine many values of many variables rather than the often-binary choice sets of resumes prior to her study (see also Lahey and Beasley 2009). Using this revision of the correspondence audit , she could test if employers were less likely to call back older workers due to judgments and assumptions about human capital (statistical discrimination) or due to a general preference for younger workers (taste-based discrimination ). She found some evidence for statistical but not taste-based age discrimination . Importantly, her computerized method of creating resumes has also been used to develop several large-scale correspondence audits (e.g., Deming et al. 2016; Oreopoulos 2011). In another study, Leo Kaas and Christian Manger (2012) conducted a correspondence audit in Germany in which they found that Turkish applicants were less likely to receive a callback than German applicants. However, they submitted some applications with two reference letters that included information on personality and work ethic. The authors found that among applications that included these reference letters, there were no statistical differences between the callback rates for German and Turkish applicants, suggesting that employers in Germany engage in statistical discrimination against Turkish applicants. These four studies highlight an important shift in audit studies from simply documenting discrimination to exploring the process in more detail. This trend would continue throughout the following decade and shape the focus and contributions of future audit studies.

4.5 The Current Wave: The Early 2010s to Present

Since the early 2010s, the number of audit studies appearing in journals and working paper form has grown exponentially. By my count, the number of audit studies conducted between 2010 and 2017 is already quadruple the number conducted between 2000 and 2009. For that reason alone, it would be incredibly difficult to cover all of these studies with any detail in this part. With apologies to those not covered here, I focus on what I consider to be the most significant developments during the past 7 years. However, it is also important to note that researchers have continued to expand the domains of study to areas such as healthcare (Kugelmass 2016; Sharma et al. 2015; Shin et al. 2016), politics and public service (Butler and Broockman 2011; Einstein and Glick 2017; Giulietti et al. 2015; Hughes et al. 2017; McClendon 2016; White et al. 2015), religious organizations (Wallace et al. 2012; Wright et al. 2015), eBay and Craigslist transactions (Besbris et al. 2015; Doleac and Stein 2013; Nunley et al. 2011), and new sharing economy market transactions such as Airbnb and Uber (Cui et al. 2017; Edelman et al. 2017; Ge et al. 2016). Additionally, researchers have expanded the countries of study to include Argentina (Bóo et al. 2013), Belgium (Baert 2016; Baert and Verhofstadt 2015), Brazil (de Leon and Kim 2016), China (Maurer-Fazio 2012; Maurer-Fazio and Lei 2015; Zhou et al. 2013), the Czech Republic (Bartoš et al. 2016), Ghana (Michelitch 2015), Israel (Ariel et al. 2015; Ruffle and Shtudiner 2015; Zussman 2013), Malaysia (Lee and Khalid 2016), Mexico (Arceo-Gomez and Campos-Vazquez 2014; Campos-Vazquez and Arceo-Gomez 2015), Norway (Andersson et al. 2012), Peru (Galarza and Yamada 2014, 2017), and Poland (Wysienska-Di Carlo and Karpinski 2014). HUD has also continued to conduct audit studies with a new iteration of the HDS in 2012 (Turner et al. 2013).

I believe there have been at least four major developments in audit research during the most recent period: (1) continued attempts to adjudicate among types of discrimination, (2) an increased focus on context and the conditions under which discrimination occurs, (3) an increased focus on methodological issues in audit design, and (4) the inclusion of additional data from outside the audit itself. These developments are not mutually exclusive; many studies incorporate two or more of these developments.

4.5.1 Adjudicating Among Types of Discrimination

Scholars have long sought to understand the reasons for discrimination and to better adjudicate among types of discrimination (Aigner and Cain 1977; Altonji and Blank 1999; Arrow 1972; Becker 1957; Dymski 2006; Guryan and Charles 2013). Discrimination research has often focused on whether decision makers discriminate based on a general dislike of a certain group (taste-based discrimination ) or based on assumptions about the average characteristics of an individual from that group (statistical discrimination).Footnote 9 Recent audits have attempted to adjudicate between taste-based and statistical discrimination by varying multiple characteristics and examining differences in response rates between types of characteristics (more or less susceptible to taste-based discrimination ) and examining interactions with characteristics that might provide information to overcome statistical discrimination (Agerström et al. 2012; Ahmed et al. 2010; Auspurg et al. 2017; Baldini and Federici 2011; Bosch et al. 2010; Capéau et al. 2012; Carlsson and Ericksson 2014; Drydakis 2014; Edo et al. 2013; Ewens et al. 2014; Gneezy et al. 2012; Hanson and Hawley 2014; Hanson and Santas 2014). The results from these studies are somewhat mixed as to whether taste-based or statistical discrimination occurs more often (or some combination of the two). These mixed findings likely stem from the variety of locations and characteristics studied.

Two studies related to taste-based versus statistical discrimination stand out among the rest (Bartoš et al. 2016; Pager 2016). In the first, the authors examined how both an individual characteristic, in this case race, and the type of market can lead to “attention discrimination,” or the differential use of available information. The authors set up audits in rental housing and labor markets and found that in the first market, decision makers selected more applicants overall and more often examined additional information from minority applicants. In the later market, decision makers selected fewer applicants overall and more often examined additional information from majority applicants. Thus, discrimination in acquiring information about candidates occurred at the initial stage of selection and varied by the selectivity of the market. We should be cautious to consider how these types of processes – overall response or selection rates in a given market and the differential use of available information – might influence future audits.

In the second, Devah Pager (2016) examined whether firms that discriminated in a previous audit are still in business 6 years later. Economists suggest that an efficient market should eventually weed out taste-based discrimination since not all employers exhibit that type of discrimination and those who do will pay a penalty for inefficient hiring (Arrow 1973; Becker 1957). Using additional data on firm failure, Pager found that prior discrimination is associated with a firm going out of business. Although other factors may explain this relationship, the findings are at least consistent with taste-based discrimination.

4.5.2 Context and Conditions Under Which Discrimination Occurs

Another major development during this period has been researchers’ increased focus on context and the conditions under which discrimination occurs. Two aspects of context – geographic location and occupation or market characteristics – have played a significant role in recent audits. Those audits that have taken geographic variation into account often examine differences by neighborhood characteristics such as racial, ethnic, immigrant, and SES composition (Acolin et al. 2016; Carlsson and Ericksson 2014, 2015; Carlsson et al. 2017; Galster et al. Forthcoming; Ghoshal and Gaddis 2015; Hanson and Hawley 2011; MacDonald et al. forthcoming). Others have examined geography in more detail by tying discrimination- or prejudice-based theories into the analysis (Besbris et al. 2015, Chap. 8 of this volume; Gaddis and Ghoshal 2015; Hanson and Hawley 2014; Phillips 2016a). A second strand of research has considered if levels of discrimination are influenced by the types or composition of occupations (Albert et al. 2011; Andriessen et al. 2012; Booth and Leigh 2010; Bursell 2014; Carlsson 2011; Derous et al. 2012; Zhou et al. 2013), whether a job is a promotion (Baert et al. 2016a), whether an applicant is overqualified (Baert and Verhaest 2014; Verhaest et al. forthcoming), or market tightness or slackness (Baert et al. 2015; Carlsson et al. 2015; Farber et al. 2017; Vuolo et al. 2017).

Some researchers have varied multiple individual characteristics simultaneously and examined interactions to try to capture a broader spectrum of the decision-making process. In particular, recent audits have focused on interactions between race/ethnicity and educational credentials (Carbonaro and Schwarz, Chap. 7 of this volume; Darolia et al. 2015; Deming et al. 2016; Gaddis 2015, 2017e; Lee and Khalid 2016; Nunley et al. 2015), race/ethnicity and criminal record (Ahmed and Lang 2017; Decker et al. 2015; Uggen et al. 2014), race/ethnicity and sexual orientation (Mazziotta et al. 2015) and various combinations of personal characteristics and human capital characteristics (Andersson et al. 2012; Baert and Vujic 2016; Baert et al. 2016b, 2017; Johnson and Lahey 2011; Namingit et al. 2017; Neumark et al. 2015; Nunley et al. 2016, 2017; Oreopoulos and Dechief 2012; Pedulla 2016; Phillips 2017).

Some of the most interesting research to examine context and conditions has focused on the effects of policies . In one such study, a team of researchers examined whether discrimination against individuals with a disability varied by whether a company was subject to the Americans with Disabilities Act (ADA) (Ameri et al. forthcoming). The authors found that the ADA reduced discrimination against disabled applicants among employers that were covered under the law. A second study used audit and non-audit data to examine differences in age discrimination across states by differences in anti-discrimination policies (Neumark et al. 2017). The authors found no strong relationship between the strength of state laws and discrimination rates. Finally, a third study used a difference-in-differences design with an audit, multiple time points, and a policy change (Agan and Starr 2016). The authors tested the effect of ban-the-box policies, which prevent an employer from collecting information on criminal record, on levels of racial discrimination in hiring. They found that after ban-the-box policies went into effect, levels of racial discrimination increased. The authors suggest that when employers cannot ask about criminal history, they may engage in statistical discrimination and assume that black applicants have a criminal record.

4.5.3 Methodological Issues in Audit Design

In recent years, scholars have considered at least three methodological issues in audit design: (1) paired vs nonpaired audits, (2) indirect signals of race, and (3) the Heckman critique of unobserved differences between groups. First, in my experience, the question of paired versus non-paired audit design is often a concern during IRB submission and subsequent discussions. A paired audit design opens the research up to an increased chance of experiment discovery because decision makers can potentially see two applicants or inquiries that are very similar. However, conventional wisdom suggests that the paired design is more statistically efficient, decreases the amount of time required for data collection, and can lead to a larger sample size (Lahey and Beasley, Chap. 4 of this volume). In at least two cases, fear of experiment discovery preemptively led to a non-paired audit design (Weichselbaumer 2015, 2016). Additionally, researchers have raised concerns that paired designs may influence findings of discrimination because researchers insert fake applicants into the applicant pool without knowing the composition of that applicant pool (Phillips 2016b; Weichselbaumer 2015). Employers compare applicants to each other and by inserting more than one applicant into a particular pool, researchers may influence the process. In fact, Phillips (2016b) developed a method to test these effects and found that “adjusting for applicant pool composition increases measured discrimination by 20% on average” (2016b: 1). Moreover, proper power analysis suggests that paired audits are not needed as often as researchers think (Vuolo et al. 2016, Chap. 6 of this volume).

I have devoted considerable time and effort to a second methodological concern – the indirect signaling of race through names (Gaddis 2017a, b, c d). With correspondence audits , researchers lose the ability to directly convey race through appearance and must rely on an indirect signal, such as a name, to signal race. Although prior research occasionally raised some concerns about the signal of names (e.g. Bertrand and Mullainathan 2004), only 17.5% of the studies I reviewed used pretests to examine the perception of names used in an audit (Gaddis 2017a). My work has shown that racial perceptions of white and black names are often linked with social class (Gaddis 2017a), Hispanic names are strongly identified (Gaddis 2017b), immigrant generational status can be discerned through names (Gaddis 2017c), and, perhaps most importantly, audit findings are strongly linked to the names researchers use (Gaddis 2017d). Still, more needs to be done to examine the signals we use in audit studies (see next part).

The final area of methodological inquiry concerns the Heckman critique of unobserved differences between groups and has received the most scholarly attention of the three issues discussed here (Heckman 1998; Heckman and Siegelman 1993). James Heckman’s critique is that scholars using the audit design assume that unobservable characteristics have equal means across groups, yet scholars cannot confirm that. Heckman suggests that multiple components could enter into the decision-making process – some controlled for by audit design and others unknown to designers but known to the decision makers. In other words, characteristics that researchers do not include on a resume or in an email . These components combine to place a candidate above or below the threshold to receive a response. If the two groups being studied have different variances on these important unobserved components, audit studies may over or underestimate discrimination or detect an effect when there is not one. David Neumark (forthcoming) provides a more detailed discussion of this critique and has devised a method to produce an unbiased estimate of discrimination and avoid this critique (Neumark 2012). Neumark (2012) reanalyzed Bertrand and Mullainathan’s (2004) original audit data using this method to account for the variance of unobservables and found stronger evidence of racial discrimination . Two individual studies have implemented Neumark’s method, with no clear pattern regarding bias (Baert 2015; Neumark et al. 2016). Two other studies have re-analyzed data from multiple audits and suggest that employment audits appear to be susceptible to the Heckman critique (Carlsson et al. 2014; Neumark and Rich 2016). The authors of these two studies advise that scholars still have a lot of work to do in improving the audit method by more directly addressing this critique.

4.5.4 Including Additional Data from Outside the Audit

A final major development in recent audit research is the inclusion of additional data from outside the audit itself, something done by many of the studies already mentioned in this part. Several researchers have included geographic data on neighborhood and city characteristics to supplement audits (e.g. Acolin et al. 2016; Carlsson and Ericksson 2014, 2015; Ghoshal and Gaddis 2015; Hanson and Hawley 2011). Others have included other types of available data, such as firm closure (Pager 2016), mortgage lender transactions (Hanson et al. 2017), and existing survey data on racial/ethnic attitudes and beliefs (Carlsson and Ericksson 2017; Carlsson and Rooth 2012).

One of the most promising avenues of inquiry into discrimination is the combination of audits with other methods of data collection. Following in the footsteps of Pager and Quillian (2005), researchers are increasingly obtaining a second round of information from the same individuals who previously participated in an audit. Some researchers have followed-up with employers to administer implicit association tests (IATs) to examine the connection between implicit bias and discrimination (Agerström and Rooth 2011; Rooth 2010). Other researchers have followed-up with surveys or interviews after an audit to attempt to better understand the reasons behind discriminatory actions (Bonnet et al. 2016; Midtbøen 2014, 2015, 2016; Zussman 2013). Although institutional review boards (IRBs) may be hesitant to allow researchers to engage in multiple points of contact with audit participants , some researchers have successfully shown that additional methods of data collection do not necessarily need to follow up with the original audit participants (Gaddis and Ghoshal 2017; Kang et al. 2016).

I believe that researchers should continue in the direction of the trends discussed above – adjudicating among types of discrimination, focusing on context and the conditions under which discrimination occurs, focusing on methodological issues in audit design, and including additional data from outside audits. In particular, researchers should try to include geographic data in audits, given the wide availability of geographic data and the relative simplicity and usefulness of including such data in analyzing audit outcomes. Next, in the final part, I outline some limitations of correspondence audits and return to the issues discussed in this part with additional thoughts on continuing to improve correspondence audits.

5 Limitations of and Ways to Improve Correspondence Audits

Despite the rapid advancement of correspondence audits over the past two decades, several serious limitations exist that scholars must continue to address. Limitations of in-person audits have been covered by others in detail, particularly James Heckman (1998; Heckman and Siegelman 1993), and I draw upon that work here. However, correspondence audits often have their own unique quirks and limitations. By no means is this part intended to be an exhaustive list of all the limitations of correspondence audits, but instead some areas where I see the biggest problems and/or new potential solutions. I highly recommend the reader turn to David Pedulla’s chapter (Chap. 9 of this volume) for a more extensive and detailed discussion of these and other issues.

Perhaps most important is the general limitation of audit studies in uncovering mechanisms rather than simply documenting the existence of discrimination. As discussed in the previous part, recent work has started to expand our knowledge in this area in increasingly innovative ways. Not all questions will lend themselves to design tricks built into studies to help discover mechanisms , nor can researchers always implement complex factorial designs to test potential mechanisms. My recommendation is that researchers should be more open to collecting survey experiment data side-by-side with field data from audit studies (e.g. Diehl et al. 2013; Gaddis and Ghoshal 2017). The deception of the audit study may allow us to document discrimination but a similar scenario presented as a survey experiment may allow us to explore potential mechanisms with the right questions. Moreover, the rise of Amazon’s Mechanical Turk (MTuk) makes collecting survey experiment data relatively quick and cheap (Campbell and Gaddis 2017; Porter et al. 2017). In ongoing work combining an audit with a survey experiment, I find that roommate discrimination against many different racial and ethnic groups is driven by issues of cultural fit. However, blacks face higher levels of discrimination than others due to negative perceptions about financial stability and courteousness, despite respondents receiving the same information about all racial/ethnic groups (Gaddis and Ghoshal 2017). These findings would not have come to light if we had implemented a correspondence audit or survey experiment alone.

A second major limitation of correspondence audits is indirect signaling of characteristics. Correspondence audits often require signals to be sent through names, statements, lists, or other text embedded in communications. In my own research, I have worked to understand how names can be used to signal race, ethnicity, and immigrant status (Gaddis 2017a, b, c) and have found that signals of race are conflated with social class and that conflation explains differences in response rates across previous correspondence audits (Gaddis 2017d). Still, more work needs to be done to ensure that construct validity is high when we need to indirectly signal characteristics in correspondence audits. At a minimum, researchers should pretest their signals in a scientific manner to help increase construct validity. Additionally, more work is needed to explore the possibility of alternate signals since there is often more than one way to indirectly signal a characteristic.

The signaling of characteristics is also related to the way we can conduct correspondence audits and the level of external validity of those audits. A characteristic such as race or gender may convey different things depending on how it is signaled and the context in which it is signaled. Not only are correspondence audits only as good as the signals they use to convey key characteristics, but audit studies also only tell us about a specific avenue of correspondence with a specific signal. For example, real job seekers may use any combination of online job sites, personal and professional networks, alumni resources, headhunters, and employment events. How race is conveyed and the meaning of race likely vary across these different means of searching for a job. Static, written signals – such as name, professional affiliations, or even checking a box for race – may cue stereotypes about race. Dynamic, interpersonal signals – such as a discussion with a reference or interaction with the individual – may permit more flexibility in thoughts about race. Although others have raised concerns about how audits begin with a narrow sampling frame (e.g., jobs or housing posted in newspapers or on websites) and limit generalizability to the entire job or housing search process (Friedman 2015; Gaddis 2015; Heckman and Siegelman 1993; Pitingolo and Ross 2015), I suggest that the narrow sampling frame also limits our knowledge of discrimination processes only to those that can be conveyed through certain static and often indirect signals .

Although in-person audits have occasionally examined multiple outcomes at various stages of the processes they study (Bendick et al. 1994; Pager et al. 2009a; Turner et al. 1991a), correspondence audits have been almost entirely limited to studying outcomes at the initial contact phase. Critics have pointed out that we do not know whether the disparities witnessed at the initial contact phase lead to disparities at later phases (Heckman 1998; Heckman and Siegelman 1993). Others have used nationally representative data to simulate the effect of employer callback disparities on wages (Lanning 2013). Still, as my own research shows, we should use all the information possible to expand the outcomes examined by audit studies. Additional information in both employment (Gaddis 2015) and housing advertisements (Gaddis and Ghoshal 2015, 2017; Ghoshal and Gaddis 2015) should be used to our advantage.

Furthermore, we should consider additional ways that audits might be tweaked to examine other outcomes. In employment audits, do human resources staff visit LinkedIn or Facebook pages, contact references, or attempt multiple contacts with applicants at different rates? Some recent articles provide excellent examples of the directions audits might continue to go in the future (Acquisti and Fong 2015; Baert forthcoming; Bartoš et al. 2016; Blommaert et al. 2014; Butler and Crabtree forthcoming; and see Crabtree, Chap. 5 of this volume for more discussion). Additionally, is it possible to return to the strategies of earlier audits and use a sub-sample with real humans to proceed deeper into processes, such as sending trained assistants into in-person or Skype interviews? I believe that future waves of audit studies will need to be creative and incorporate more variety in outcomes to push this method forward.

6 This Volume and Online Resources

This volume is organized into three broad parts: (1) The Theory Behind and History of Audit Studies, (2) The Method of Audit Studies: Design, Implementation, and Analysis , and (3) Nuance in Audit Studies: Context, Mechanisms , and the Future. You are reading the first chapter of the first part and, hopefully, you already have a better understanding of audit studies. In the second chapter, Fran Cherry and Marc Bendick discuss the historical connections between activism and scholarship through audits. Their chapter highlights the potential power of audit studies to not just document discrimination but reduce it as well. The authors advocate for a return to scholar-activism and outline four characteristics that will help facilitate that path. In the third chapter, Stijn Baert provides an excellent overview of labor market correspondence audits conducted since Bertrand and Mullainathan’s groundbreaking study. Baert organizes these studies across two major dimensions: discrimination treatment characteristic, which includes nine federally-banned (U.S.) and five state-banned discrimination grounds, and country of analysis . Overall, the author provides information on 90 labor market correspondence audits across 24 countries.

The chapters in the second part give the reader a “behind-the-scenes” look at the nuts and bolts of audit studies, as well as serve as a guide for designing and implementing your own audit studies. In the fourth chapter, Joanna Lahey and Ryan Beasley outline a number of technical aspects related to designing and conducting a correspondence audit . They cover issues of validity, participant selection, timing , technical design of correspondence, matching , sample size , and analysis , among other issues. Their chapter serves as a terrific starting point for anyone needing more information on creating their own audit. In the fifth chapter, Charles Crabtree extends this discussion by providing a detailed overview of designing and implementing an email correspondence audit . He provides information on sample selection, collecting email addresses, sending emails , and collecting outcomes. This chapter is particularly useful in thinking about automating an audit design using programming scripts. A coding appendix for this chapter will be available at auditstudies.com. In the sixth chapter, Mike Vuolo, Christopher Uggen, and Sarah Lageson offer an extensive consideration of matched versus non-matched audit designs. They provide statistical guidelines for when matching is appropriate and show that non-matched audit designs can be more efficient. Additionally, they raise some important substantive points for researchers to think about when deciding to use a matched or non-matched design.

Finally, the chapters in the third part provide even deeper insight into the audit process by discussing more design considerations and nuance. In the seventh chapter, William Carbonaro and Jonathan Schwarz outline their thought process in selecting cities in which to conduct an audit, the difficulties of using a small city, the unknowns of the employer side of an audit, and the choice of jobs for a sample . This chapter shares important “lessons learned” from experienced researchers. Although scholars cannot think through all of the possible variables involved in designing and fielding an audit in advance, I think this chapter serves as a great example of how auditing is an incredibly difficult and nuanced process. In the eighth chapter, Max Besbris, Jacob William Faber, Peter Rich, and Patrick Sharkey show how an audit can be designed to investigate a non-individual-level treatment. They use an audit to examine the mechanism of place-based stigma in the relationship between neighborhoods and outcomes for residents of those neighborhoods. Their audit, the discussion of thinking about signaling characteristics , and the theory-based use of geography provide a strong example of what future audits might looks like. In the ninth and final chapter, David Pedulla explores how audits might change and develop in the coming years. He highlights research that identifies mechanisms , examines when and where discrimination happens, and scrutinizes issues of representativeness . David’s chapter serves as a terrific bookend to this volume and should be read closely by anyone wishing to implement an audit of their own.

On behalf of the other contributors, we hope you find this volume informative and useful. We have a number of overarching goals for this book: (1) to create a go-to guide for anyone looking to conduct an audit study, (2) to provide resources for using the audit method, both within this book and online, and (3) to record the history of audits. For more information on audits, please consult our website at www.auditstudies.com and take a look at the recommend reading list below.

7 Recommended Reading

7.1 Comprehensive Articles and Books on Audits

  • “Situation Testing for Employment Discrimination in the United States.” 2007. By Marc Bendick Jr. Horizons Stratégiques, 3:17–39.

  • Clear and Convincing Evidence: Measurement of Discrimination in America. 1993. Edited by Michael Fix and and Raymond J. Struyk. Washington, DC: The Urban Institute.

  • “Experimental Research on Labor Market Discrimination.” Forthcoming. By David Neumark. Journal of Economic Literature.

  • “The Use of Field Experiments for Studies of Employment Discrimination: Contributions, Critiques, and Directions for the Future.” 2007. By Devah Pager. The ANNALS of the American Academy of Political and Social Science, 609:104–33.

7.2 Reviews of Audits and Discrimination Research

  • “What Have We Learned from Paired Testing in Housing Markets?” 2015. By Sun Jung Oh and John Yinger. Cityscape: A Journal of Policy Development and Research, 17(3):15–59.

  • “The Sociology of Discrimination: Racial Discrimination in Employment, Housing, Credit, and Consumer Markets.” 2008. By Devah Pager and Hana Shepherd. Annual Review of Sociology, 34:181–209.

  • “Field Experiments of Discrimination in the Market Place.” 2002. By Peter A. Riach and Judith Rich. The Economic Journal, 112:F480-F518.

  • “What Do Field Experiments of Discrimination in Markets Tells Us? A Meta-Analysis of Studies Conducted Since 2000.” 2014. By Judith Rich. Available at SSRN: https://ssrn.com/abstract=2517887

  • “A Multidisciplinary Survey on Discrimination Analysis.” 2013. By Andrea Romei and Salvatore Ruggieri. The Knowledge Engineering Review, 29(5):582–638.

7.3 Meta-Analyses of Audits

  • “Meta-Analysis of Field Experiments Shows no Change in Racial Discrimination in Hiring over Time.” 2017. By Lincoln Quillian, Devah Pager, Ole Hexel, and Arnfinn Midtbøen. Proceedings of the National Academy of Sciences.

  • “Ethnic Discrimination in Hiring Decisions: A Meta-Analysis of Correspondence Tests 1990–2015.” 2016. By Eva Zschirnt and Didier Ruedin. Journal of Ethnic and Migration Studies, 42(7):1115–34.

7.4 Articles and Books on the Methodology of Audits, Discrimination, and Field Experiments

7.4.1 Field Experiments (General)

  • “Field Experiments Across the Social Sciences.” 2017. By Delia Baldassarri and Maria Abascal. Annual Review of Sociology, 43:41–73.

  • Field Experiments: Design, Analysis,and Interpretation. 2012. By Alan S. Gerber and Donald P. Green. New York, NY: W.W. Norton.

  • “The Principles of Experimental Design and Their Application in Sociology.” 2013. By Michelle Jackson and D. R. Cox. Annual Review of Sociology, 39:27–49.

7.4.2 Audits (General)

  • Audit Studies: Behind the Scenes with Theory, Method, and Nuance. 2018. Edited by S. Michael Gaddis. Switzerland: Springer International Publishing.

7.4.3 Discrimination (General)

  • Measuring Racial Discrimination. 2004. By Rebecca N. Blank, Marilyn Dabady, and Constance F. Citro. Washington, DC: The National Academies Press.

7.4.4 Automating Resume Creation for Audits

  • “Computerizing Audit Studies.” 2009. By Joanna N. Lahey and Ryan A. Beasley. Journal of Economic Behavior & Organization, 70(3):508–14.

7.4.5 Critiques of Audits and Solutions

  • “Detecting Discrimination.” 1998. By James J. Heckman. Journal of Economic Perspectives, 12(2):101–16.

  • “The Urban Institute Audit Studies: Their Methods and Findings.” 1993. By James J. Heckman and Peter Siegelman. In Clear and Convincing Evidence: Measurement of Discrimination in America, edited by M. Fix and R. J. Struyk, 187–258. Washington, DC: The Urban Institute Press.

  • “Detecting Discrimination in Audit and Correspondence Studies.” 2012. By David Neumark. The Journal of Human Resources, 47(4):1128–57.

  • “Do Field Experiments on Labor and Housing Markets Overstate Discrimination? A Re-Examination of the Evidence.” 2016. By David Neumark and Judith Rich. Available at NBER: http://www.nber.org/papers/w22278

7.4.6 Signaling Characteristics in Audits

  • “How Black are Lakisha and Jamal? Racial Perceptions from Names Used in Correspondence Audit Studies.” 2017. By S. Michael Gaddis. Sociological Science,4:469–489.

  • “Racial/Ethnic Perceptions from Hispanic Names: Selecting Names to Test for Discrimination.” 2017. By S. Michael Gaddis. Socius, 3:1–11.

  • “Assessing Immigrant Generational Status from Names: Scientific Evidence for Experiments.” 2017. By S. Michael Gaddis. Available at SSRN: https://ssrn.com/abstract=3022217

  • “Auditing Audit Studies: The Effects of Name Perception and Selection on Social Science Measurement of Racial Discrimination.” 2017. By S. Michael Gaddis. Available at SSRN: https://ssrn.com/abstract=3022207

7.4.7 Statistical Analysis of Audits

  • “Statistical Power in Experimental Audit Studies: Cautions and Calculations for Matched Tests with Nominal Outcomes.” 2016. By Mike Vuolo, Christopher Uggen, and Sarah Lageson. Sociological Methods & Research, 45(2):260–303.

7.5 Theoretical Articles and Books on Discrimination

  • “Taste-Based or Statistical Discrimination: The Economics of Discrimination Returns to its Roots.” 2013. By Jonathan Guryan and Kerwin Kofi Charles. The Economic Journal, 123:F417–32.

  • Theorizing Discrimination in an Era of Contested Prejudice: Discrimination in the United States, Volume 1. 2008. By Samuel Roundfield Lucas. Philadelphia, PA: Temple University Press.