Introduction

“Driving While Black” refers to the perception that African Americans and other minorities are more likely than Whites to be scrutinized by police while driving due to their skin color (Harris 2002). Racial bias in stops has become an important civil rights issue and the academic community has been called upon to develop appropriate methodologies for investigating and documenting this phenomenon (Fridell et al. 2001; Ramirez et al. 2000). Many scholars and police departments are currently collaborating to develop methodologies and conclusions about the degree and distribution of the race bias problem in police stops (see for examples Cordner et al. 2002; Thomas 2002; Smith et al. 2003). In most of these efforts police forces are collecting information on the race of drivers and sometimes passengers in vehicles that have been stopped. Most of the research has required new data collection by police officers of the race of drivers (and sometimes pedestrians) that they have stopped (Fridell et al. 2001; Meeks 2000; Knowles et al. 2001; Lange et al. 2005; Smith et al. 2003).

Police recorded race data have two advantages. First, since police stop many cars, sample sizes can be very large. This can allow researchers to examine the race distribution of stops at fairly low levels of aggregation such as neighborhood, precinct, highway segment or even individual officer (Meehan and Ponder 2002; Smith et al. 2003; Thomas 2002). Officially collected data also tends to have high legitimacy with the police and can be tied to individual police officer and unit management efforts.

There are, however, at least three common problems with race and ethnicity data collected by the police. The first is that it requires police force cooperation. While there are some instances of voluntary data collection by police forces, more often there is some level of legal, political or public relations pressure that elicits police organization cooperation (Harris 2002; Meeks 2000). The veracity of data collection efforts by police departments who initiate traffic stop data programs under such pressure is in doubt. Second, official data are collected only on those citizens who are stopped, generating the methodological difficulty of identifying appropriate comparison baselines of the race composition of drivers (Smith and Alpert 2002). Finally, and most disturbingly, it is becoming increasingly clear that police often do not record all stops, representing a serious source of selection bias in analyses that rely on police-generated data. For example, Cordner et al. (2002:23) estimate that San Diego police reported only 53% of traffic stops in African-American and Hispanic neighborhoods. Likewise, Smith and Petrocelli (2001) found that Richmond, VA, police officers complied with that city’s data collection protocol only 64% of the time. Similarly, Smith et al. (2003) report that the North Carolina State Highway Patrol recorded about three-quarters of citations, half of written warnings, and barely 10% of stops that did not result in a citation or written warning (see also Fridell et al. 2001; Donohue 2000; for similar concerns). One might think of this problem as the racial profiling equivalent of the dark figure of crime underreporting in official statistics, with many of the same uncertainties; what patterns associated with citizen, police, stop event, and community characteristics, underlie measured versus unmeasured police encounters?

Given the political, methodological, and quality limitations of police collected data on the race and ethnicity of vehicle (or pedestrian) stops it is not surprising that researchers are turning to survey based data to inform understanding of race disparities in police stops. Weitzer and Tuch (2002) use Gallup survey data on citizens’ perceptions of racial profiling. As part of that effort respondents were asked if they felt they had ever been stopped by the police because of their race or ethnicity. Lundman and Kaufman (2003) and Engel and Calnon (2004) use a 1999 national survey of police contacts to examine race/ethnic differences in self-reports of police stops and post-stop actions, while statistically controlling for size of place, social class, age and gender. Lundman and Kaufman (2003) find that African Americans are more likely to be stopped and that both African Americans and Hispanics are less likely than Whites to be stopped for legitimate reasons. Engle and Calnon (2004) find that African Americans are more likely to be cited, searched, arrested and experience police use of force, controlling for other extra-legal characteristics of the driver, the police, and the reason for the stop. Smith et al. (2003) collected survey data that accomplished both tasks, asking about police stops, treatment during the stop and perceptions of trust in the police and belief in racial profiling. They found small levels of racial disparity in stops by the North Carolina State Highway Patrol but large disparities in stops by local police. They also found for both whites and African Americans that being treated with disrespect during a stop strongly undermined trust in the police as did belief in racial profiling. One advantage of survey data as illustrated by both Lundman and Kaufman (2003) and Smith et al. (2003) is that they provide a solution to the methodological problem of comparison baselines. With a survey one can directly compare the status and behavioral attributes of those stopped and not stopped, yielding estimates of race or ethnic disparity in the incidence of police contact, while controlling for other factors, such as driving practices, which constitute the legal basis for experiencing a traffic stop.

Survey based self-report data, like police report data, also present methodological problems. Surveys are more expensive to carry out than police collected data. Police report data is typically collected by officers during their normal work day with no new resources. Because of the cost issues survey sample sizes will often be smaller than police recorded data. Small sample sizes are fine, of course, if racial disparity in stops is large. But when stops are rare events or racial disparity is small, sample sizes can become decisive. Of course, survey samples are unlikely to ever be large enough to identify individual problem police officers.

Two related literatures provide us with insights about self-report dynamics in the context of surveys concerned with sensitive topics. First, since the 1940s, criminologists have been actively engaged in validating the self-report method in studies of deviance, crime, and delinquency (Junger-Tas and Marshall 1999). Criminologists reliance upon self-report methodology reflects the limitations of official sources of crime data: police data are political artifacts, describe only a portion of the population involved in crime, and tell us little about the correlates of criminal activity (Mosher et al. 2002). Second, there is considerable evidence in the literature on survey research and social desirability reporting errors that African Americans in particular, and ethnic minorities in general, are more likely than Whites to underreport sensitive behaviors, like a police stop. It also may be possible in some politicized contexts that some minorities might over report police stops, especially if they see responsibility for the stop as being a result of police bias rather than personal behavior. If non-reporting is associated with the race of respondent, then survey based analyses are potentially misleading: either exaggerating or underestimating the degree of racial disparity in police stops. In light of the policy relevance and accountability implications of studies that test for police bias (Smith and Alpert 2002), it is particularly important to document the nature of reporting error in self-report studies of police contacts. As we report shortly, the survey research literature points to higher rates of underreporting of all types of threatening, sensitive or embarrassing behaviors by African Americans.

Much of the prior research on race and social desirability survey effects is quite old. It is possible that gradual social evolution toward more equal race relations have muted or even eclipsed race differences in social desirability. It might even be the case that the recent political attention to racially biased policing might increase the salience of police stops in the African American community leading to relatively higher reports of police stops than in the white community. Thus new research is appropriate both to advance research on racial bias in police citizen stops and to see if race/ethnic differences in social desirability effects remain an important consideration in survey research.

In this paper we use a reverse record check survey to gauge the likely non-reporting and social desirability effect biases in self-report survey data used to study racial disparity in police stops. In a reverse record check survey, the investigator knows the answer to the question before administering the survey and then compares this information to respondents’ answers to see how accurate their responses are. In this paper we examine the results of a reverse record check survey of North Carolina drivers with known speeding citations to estimate rates of underreporting of traffic stops by race. We then use this information to adjust estimates of race disparity in stops in a larger companion survey of North Carolina drivers. Footnote 1

In this paper we begin with a review of the literature on social desirability effects, race and underreporting, and the reverse record check survey methodology. We then introduce our record check survey and estimate race specific rates of reporting error to questions about police stops using the North Carolina Reverse Record Check Survey (NCRRCS). We also evaluate if race and social desirability effects are associated with self-reports of other driving behaviors and with possible backward telescoping bias among those who do report a police stop. Finally, we show how response bias to questions on police stops might influence estimates of racial disparity in police stop behavior using the larger North Carolina Driver Survey.

Background Literature

Underreporting and Item Non-response for Sensitive Questions

Discovering the levels and types of inaccuracy in survey responses, and the characteristics of inaccurate reporters are our chief undertaking in this study. Sudman and Bradburn (1982) identified four factors related to survey response errors: memory, motivation, communication, and knowledge. Footnote 2 Motivation error, the major concern of this research, is inaccurate answers given to survey questions because the respondent wants to appear to the interviewer in a positive light. Motivation errors elicit a socially desirable response and this type of reporting bias can manifest itself as item non-response, overreporting, and/or underreporting.

Overreporting is common for items that measure socially desirable activities (e.g., voting). A number of studies (cf. Traugott and Katosh 1979; Abramson and Claggett 1984; Hill and Hurley 1984) have reported a strong association between self-reported and validated voting behavior by race: African Americans are more likely to overstate voting than Whites. Underreporting is more common for items that measure undesirable activities (see Tourangeau et al. 2000 and Bradburn et al. 1979 for reviews).

Questions asking about undesirable activities are often referred to as “threatening” or “sensitive” questions. These types of questions ask about activities that are thought to be private, embarrassing, or illegal (e.g., personal income, political party affiliation, religion, alcohol consumption, drug use, sexual behaviors, and criminal activity). Underreporting is a concern when asking threatening questions because respondents may feel that admitting to undesirable behaviors would lower their esteem in the eyes of the interviewer or they may think it is not the interviewer’s business to know the answer to potentially embarrassing questions. Thus, social desirability is thought to be the root of non-response and underreporting to sensitive questions (Kormendi, 1988). Self-reports of police stops are clearly instances of threatening questions that might elicit social desirability based underreporting.

Reported non-response rates to threatening questions range from less than 5% for questions with minimal threat (e.g., witnessing a crime but not reporting it, Clark and Tifft 1966) to as high as 73% for questions considered to be highly threatening (e.g., bankruptcy, Bradburn et al. 1979). Crime victimization is a threatening topic that has been studied repeatedly and the results consistently show that it is underreported (Czaja et al. 1994; Murphy and Dodge 1981; Yost and Dodge 1970; Dodge 1970; Turner 1972).

Survey questions about behaviors that violate state or federal laws are often of questionable validity because of the potential for underreporting and incomplete or inaccurate reports. It seems that respondents, in their attempt to be both good respondents by answering the question and to present a positive image to the interviewer, often do not refuse to answer but rather report that they did not engage in the threatening behavior being questioned (Bradburn et al. 1978). For instance, Clark and Tifft (1966), using a polygraph to check validity, found that while 38% of respondents underreported speeding only 15% refused to answer the question.

Reporting police stops and other driving behaviors represents a special case of a larger effort by researchers to establish criterion validity for self-reports of crime and delinquency. Hindelang et al.’s (1981) study on this topic found that concordance between self-reported contacts with the juvenile justice system and official records of such contact ranged from 0.70 to 0.83. Likewise, Huizinga and Elliott’s (1986) study, using data from the National Youth Survey (NYS), found a concordance rate of 0.78 in comparing self-reports of an arrest with official arrest records. Farrington et al. (1996), using data on two cohorts from the Pittsburgh Youth Study, found high levels of concurrent and predictive validity of a self-report delinquency inventory when compared to juvenile court petitions. In addition, Farrington et al. (1996) compared the extent to which boys with official court records self-reported being arrested by the police. Concordance rates for official contacts approached 67%.

Self-report validity has also been the subject of research concerned with drug use. Drug use offers an opportunity to employ chemical sample analysis, an independent measure, to check criterion validity. Akers et al. (1983) found a high level of concordance between self-reports of tobacco smoking and chemical analysis of nicotine in saliva samples among a sample of high school students. They estimated concordance rates upwards of 95%. Similar techniques have been found useful for testing the validity of self-reported illicit drug use. For example, comparisons of self-reports and urinalysis results for Arrest Drug Abuse Monitoring (ADAM) program participants have revealed evidence of underreporting. Taylor and Bennett (1999), employing ADAM data from five US cities found that 7.8% of arrestees underreported drug use, while approximately 2% over-reported it. Similar data from the Drug Use Forecasting program shows underreporting increases as questions move from asking about soft drug use to harder drugs (Thornberry and Krohn 2000). For example, 43% of arrestees in Philadelphia underreported cocaine use, while just 13% underreported marijuana use in 1988.

Differences by Race in Underreporting and Item Non-response for Sensitive Questions

Using the Marlowe–Crowne Social Desirability Scale, Stocking (1979) found that non-Whites are more likely to attempt to please interviewers by giving socially acceptable answers to sensitive questions. Consistently, African Americans, are more likely than Whites to respond to surveys (Groves and Couper 1998; Cohen and Carlson 1995; Brehm 1993; Jackson et al. 1982; O’Neil 1979; Hawkins 1975). Thus, although African Americans are more likely to complete interviews, the information they provide on sensitive topics may have somewhat lower validity than information provided by White respondents due to their relatively greater desire to respond in socially acceptable ways.

For our purposes, those refusing to be interviewed are not as important as discerning whether the information given in the interview and, especially, responses to threatening questions are accurate. Women, non-Whites, and those with lower levels of education are more likely to underreport unacceptable behavior or counter-normative attitudes (DeLamater 1982). Sudman and Bradburn (1974), summarizing previous research on responses to attitude questions, report that Blacks are more likely than Whites to exhibit response effects for questions that arouse concern. Witt et al. (1992), in a study of item non-response to questions about drug use, report that non-Whites are more likely than Whites to be item non-respondents. Cox et al. (1992) found that, compared to Whites, African Americans and Hispanics not only had high non-response rates, but also higher incidences of inconsistent responses to questions about drug use. Fu et al. (1998) report that induced abortions still remained severely underreported in the 1995 National Survey of Family Growth. Overall only 59% of abortions were reported. By race, the estimated proportion of reported abortions was 64% for Whites and 47% for Blacks.

Previous findings using reverse record check survey methods have found further evidence of African Americans being more likely than Whites to underreport in response to sensitive or threatening questions. Czaja et al. (1994), for example, examined respondent’s strategies for recall of crime victimization incidents. They found that 71% of Whites reported known victimizations compared to only 44% of known victimizations by African Americans. Hence, the odds of Whites reporting victimization was 1.9 times larger than it was for African Americans (cf. Sparks 1981; Biderman and Lynch 1981; Dodge 1983 for similar findings on victimization; see Czaja and Blair 1990; Czaja et al. 1992 for studies on other types of questions). Udry et al. (1996), in a small (n = 104 American women) medical record linkage analysis of abortion underreporting, found that 19% of women failed to report one or more abortions. Non-Whites were 3.3 times more likely than Whites to underreport. Magura et al. (1987), comparing self-reported drug use to urinalysis results among a sample of methadone treatment patients in New York City found that African Americans were more likely than other groups to underreport drug use. There is also some very old evidence that African Americans are more likely than Whites to conform, or acquiesce to questions with positive social desirability cues (Lenski and Leggett 1960; Hare 1960).

Delinquency studies have also been concerned with documenting the correlates, often in terms of group differences, in the criterion validity of self-reports (Thornberry and Krohn 2000). For example, Hindelang et al. (1981) reported that African American males were more likely than other race-by-gender groups to underreport delinquent involvement. Huizinga and Elliott (1986) find a similar pattern of differential validity. On the other hand, Farrington et al.’s (1996) more recent study showed that African American males were no more or less likely to self-report delinquent behavior than white males. Further, concordance rates for concurrent validity were higher among white males when considering the admission of a criminal offense, but were higher for admitting an arrest among African American males. While it appears as though there is good reason to expect that African Americans may be more likely to underreport police stops, there is enough uncertainty to warrant further research. Indeed, Thornberry and Krohn (2000: 58) conclude that race differentials in self-report validity should be a high priority in future methodological research.

Record Check Surveys

The consequences of race differences in reporting police stop experiences are potentially great. For example, higher rates of underreporting by African Americans than by Whites would mean that survey methods tend to produce data that is prone to type II errors, failing to detect a true group difference in police stop likelihood. Or, in cases where group differences are sufficiently large, the survey approach would tend to underestimate the magnitude of the “Driving While Black” phenomena. Conversely, if African Americans are more likely than Whites to report being stopped, that could result in an exaggerated indictment of law enforcement behavior. While the previous literature strongly suggests that African Americans are less likely than Whites to report sensitive behavior, it may be that the current politicization of the “Driving While Black” phenomena would encourage African Americans to recall and report driving stops. Since media reports tend to place the blame for stops on the police and not on African American drivers, the social desirability effects may be weakened for reports of police stops by African Americans in the current political climate. The political climate might also increase the motive for some African Americans, primarily ones not subject to social desirability pressures, to report stop events beyond the survey recall period.

While it seems unlikely that any respondent would report a stop that had never happened, it does seem possible that if there were race differences among those who report a stop it would be reflected in a greater degree of forward telescoping. Forward telescoping is when a respondent reports on an event that happens earlier than the survey recall period. In this record check survey we asked respondents to report stops that happened during the last year, but respondents were selected into the sample based on stops between 7 months and 14 months earlier than the interview date. Thus we can test for race differences in telescoping in the analyses that follow. If African Americans are more likely than whites to report earlier stops, then there may be some relative over-reporting, perhaps based on the politicization of racial profiling, to counterbalance the expected social desirability based underreporting.

One method for identifying underreporting and inaccurate survey responses is to conduct a reverse record check survey. This type of survey is a methodological tool used to evaluate the validity and accuracy of respondents’ answers by conducting a survey that asks respondents for information that the researcher has already obtained from official records. Survey data collected from the respondents can then be compared to data from previously obtained records to assess the accuracy of the respondent’s answers. The purpose of our research was to find out whether people who have been stopped by police are willing to report the stops during a telephone interview. The findings from a reverse record check survey may allow researchers without access to respondents’ known behaviors to statistically adjust for underreporting.

The National Crime Survey (NCS) has included three record check surveys in which police reports were compared to survey answers for a sample of citizens with known police contacts (Yost and Dodge 1970; Dodge 1970; Turner 1972). Based on findings from these record check surveys the NCS was redesigned to use survey items that produced less underreporting. While we built some question wording experiments into our survey to improve future surveys, our primary objective was to determine if differences in the underreporting of police stops by race affects estimates of the existence and magnitude of the “Driving While Black” phenomena.

Research Methods

The North Carolina Reverse Record Check Survey (NCRRCS) was based on a sample of drivers who had received a speeding citation in the 6 months previous to the fielding of the survey. We focus on stops associated with speeding citations because they are the most common form of vehicle stop and the issuance of a citation is likely to make the event memorable. In this way we minimize underreporting from simple forgetting that the event occurred. The North Carolina Driver Survey was a larger telephone survey of a disproportionate, race stratified random sample of licensed North Carolina drivers. The two surveys were fielded simultaneously and overlapped considerably in content. This was done so that the telephone interviewers would not realize that the reverse record check survey was being administered only to those with recent police stops. We primarily are concerned here with the results of the record check survey. In the conclusion we use race specific estimates of police stops from the larger North Carolina Driver Survey to illustrate how the results of our record check analysis might be used to adjust estimates of race disparity in police stops.

Interviews for the NCRRCS were collected by telephone between July 2000 and February 2001. Footnote 3 The sampling frame was obtained from the North Carolina Administrative Office of the Courts of drivers ticketed for speeding within the 6 months prior to the beginning of the survey. From this list, we selected a disproportionate, stratified sample of 1564 names from which we expected to obtain 600 completed interviews: approximately 300 with African American respondents and 300 with White respondents.

From this sample, a total of 605 interviews were completed with an overall cooperation rate of 69.5%. Footnote 4 The cooperation rates for African Americans (69.1%) and Whites (70.0%) were nearly identical. As in previous research, Whites refused to participate at slightly higher rates and African Americans were slightly more difficult to locate. It was more difficult to find valid phone numbers for African Americans, but once contacted, they cooperated at higher rates.

The overall response rate was much lower (38.7%) because of the difficulty of locating telephone numbers for sampled individuals. Footnote 5 African Americans were more difficult to locate than Whites and had a correspondingly and significantly lower overall response rate (34.6% vs. 43.7%; Chi-Square = 13.5, p = 0.000). In addition, young drivers were significantly less likely to be located than older drivers. The mean age of respondents was 35.0 years and the mean age of non-respondents was only 30.8 years (t test = 6.5, p = 0.000). There were no significant gender differences for either race between respondents and non-respondents.

We performed a series of analyses to see if the response rates had any implications for the gender and age composition of the African American and White respondent groups. In general, the age and gender characteristics of the respondents are very similar to the total sample (i.e., respondents and non-respondents). For both groups, the age bias is similar, although the African American group is 2 years older, on average, than the total sample, while the White respondents are only a year older. African American females are slightly underrepresented (1.0%) while White females are slightly overrepresented (1%). African American males are underrepresented to a higher degree (4.8%) and White males are overrepresented by about 4.8%. We control for age and gender in the analyses that follow. Gender does not influence reporting and so this difference in sample selection is not consequential. In the analyses that follow every additional year in age increases the odds of reporting a speeding stop 1.02 times. Given that there is only a 1 year difference in the white and black and sample this is a trivial difference and not consequential for our estimate of race differences in under reporting. Footnote 6

Record check surveys often have difficulty in directly matching survey responses to the official records that generated the original sampling frame (Miller and Groves 1985). We designed our survey so that a direct match to the speeding citation that made respondents eligible for the survey was not required. Instead we sampled people with citations in the previous 6 months and over a survey period of an additional 6 months asked them if they had been stopped in the last year and if any of those stops had been for speeding. This approach means we over sample stops, in the sense that respondents with multiple stops are at increased risk to report any speeding stop, not necessarily the one that drew them into the sample. We felt that this was a reasonable approach since we conceptualize the primary threat to the validity of self-reports for research on race disparity in stops to be race linked social desirability differences in reporting. If African Americans are more likely to be stopped in the recall period than are whites this may lead us to underestimate the size of race differences in stop reporting given our method. Footnote 7

After the survey was completed we discovered that we had interviewed 37 cases where the respondent had the same name and was at the same phone number or address as the person in the sampling frame but was in fact a different person on age or sex. These were family members in the same home that happened to have the same first name and surname as our target respondents. They were deleted in the analyses that follow.

One week before the initial telephone contact attempt, advance letters were sent to each person in the sample. The letters explained that researchers from North Carolina State University were conducting a survey about the driving experiences of people in North Carolina, their observations of other drivers on North Carolina roads, and that the results would be used to increase traffic safety and make policy decisions. The telephone survey was conducted by the Public Opinion Laboratory at Northern Illinois University and completed interviews averaged 9 min in length.

Most of the interview consisted of general driving questions, in an attempt to reduce the threat of questions about police stops and to increase saliency. Before asking about police stops we asked 24 questions about respondent’s driving history, patterns, and law breaking activity while driving. The law breaking questions, in particular, were designed to reduce threat and prime the respondent to remember police stops. The key dependent variables were in the middle of the questionnaire and were measured by the following series of questions.

“Have you been pulled over by the police anywhere in North Carolina in the last year, that would be since (date) 1999?”

“How many times in the last year were you pulled over?”

“How many of these pull-overs were for speeding?”

The recall period referenced in the first question above was adjusted by interviewers to cover the previous 12 months, regardless of when the interview was completed during the data collection period. Footnote 8 For each speeding incident, respondents were asked the make, model and year of the vehicle they were driving; the month they were stopped; the type of street or highway on which they where stopped; type of officer; the posted speed limit and the speed the officer said they were going; and the outcome of the stop (i.e., warning, ticket). Information on up to three stops was recorded.

We are assuming in the design and analysis of this record check survey that race differences in reporting are primarily a function of social desirability differences. We focus on respondents that have speeding citations to insure that low saliency does not reduce recall. Some of the under reporting we find probably reflects simple forgetting. We assume that saliency effects are equivalent for whites and blacks. If this assumption is false and the consequence of a stop tend to be higher for blacks than whites then social desirability differences will be muted. We have no way to know if this is the case.

Results

Response Bias in Reports of Police Stops

Table 1 shows the proportion of respondents by race that reported being stopped by the police for any reason in the last year and the proportions that said they were stopped for speeding in the last year. Approximately 23% of the White respondents failed to report any police stops while 6% more African Americans (29%) failed to report any stops. This 6% difference is not statistically significant. Twenty-eight percent of White respondents and 38% of African American respondents did not report a speeding event, a statistically significant gap of 10%. These rates of under reporting are only slightly smaller than the 38% of speeding underreporting found almost 40 years ago by Clark and Tifft (1966).

Table 1 Race specific self-reports of no traffic stops and no speeding stops by police in the last year

Since the probability of speeding and being stopped is related to demographic characteristics that may be associated with race, we examined whether our basic findings held up after controlling for sex, age, education, and home ownership (as a proxy for social class). While there are no significant differences by race in sex distributions, White respondents had significantly more education, a higher rate of home ownership, and were slightly older, on average, than the African American respondents. We also included a measure of the number of weeks that had elapsed between the speeding citation which qualified the respondent for the sample and the day the survey was completed. This variable was a control for forward and backward telescoping and allows us to examine if there are race differences in response due to telescoping. African Americans were on average interviewed 1.9 weeks later than white respondents. This represented the increased difficulty in finding these respondents.

Table 2 shows the results of a logistic regression of reports of any stop and speeding stops upon race and the demographic control variables. The basic finding from the models is that the race difference in reported speeding stops remains significant when controlling for gender, age, education, home ownership, and time since citation. The inclusion of control variables barely alters the race difference in reported speeding stops, suggesting that neither the age or time since citation differences in sample selection are consequential. Model 2 for both types of stops also suggests older respondents are more likely to report their stops. Not surprisingly, the longer it was since the citation the less likely a respondent was to report either type of stop. We also examined interactions between race and age, gender, education, home ownership and time since citation. None were statistically significant. The absence of a significant interaction between race and time since citation is particularly important since it is inconsistent with the speculation that over-reporting of stops from prior to the survey recall period by African Americans might occur due to the political salience of the “Driving While Black” phenomena. This result is also inconsistent with an interpretation of race differences in memory errors.

Table 2 Logistic regression of self-reports of any stop and speeding stop upon race and demographics controls: log odds coefficient, odds ratio, (significance)

Response Bias in Reports of Other Driving Behaviors

Aside from stop data, self-report methods are capable of measuring driving behaviors, the presumed correlates of stops, which may serve as a way to analytically distinguish racial disparity from racial discrimination. Self-report methodology is common in studies concerned with the etiology of crime and delinquency precisely because they offer a way to measure the conceptually derived correlates of crime. The logic of applying self-report methodology to the study of racial profiling is the same. In this section we investigate if the response bias identified in the reverse record check survey is associated with differences in reports of driving behavior, and if this source of error differs by race. This question is important because it has implications for the utility of measuring driving behaviors with self-report methodology by revealing whether measures of driving behavior also have social desirability effects associated with race. We do not have a record to provide a direct check of criterion validity for driving behaviors. But since we do know which respondents were truthful about their stop, we reason that people who respond truthfully about police speeding stops would also report higher levels of other potentially socially embarrassing behaviors. Because the social threat associated with rolling through a stop sign or driving above the speed limit are relatively low, we would not be surprised to find no or low social desirability effect on self-reports of driving behaviors. We focus in Tables 3 and 4 on self-reports of risky driving behaviors and speeding. We also examine whether or not race interacts with admitting a stop to ascertain if social desirability effects on driving behavior responses vary by race.

Table 3 Regressions of self-reported risky driving behavior on self-reports of stops, race, and their interaction; Metric coefficient (significance)
Table 4 Regressions of self-reported speeding behavior on self-reports of stops, race, and their interaction; Metric coefficients (significance)

The dependent variable in Table 3 is an additive scale that we refer to as “Risky Driving Behavior.” We asked respondents four questions about seemingly minor but risky driving behaviors that could bring one to the attention of a police officer. “Risky Driving Behavior” sums reports of rolling through stop signs, speeding up for yellow lights, failure to signal, and not using seat belts all the time. Although the coefficient for reporting a speeding stop is positive, it is not close to being statistically significant. This suggests that there is no social desirability bias in self-reports of these minor risky driving behaviors. There are also no significant race differences in self-reports of risky driving behavior, nor does race interact with the admission of a speeding stop in the last year in it’s impact on self-reports of risky driving behavior. Footnote 9

In Table 4 we examine reports of typical speed driven in 35 and 65 mile per hour speed limit zones. Respondents who reported speeding stops were more likely to admit to higher typical driving speeds in both the 35 and 65 mile per hour speed zone. Thus for driving speed there do seem to be social desirability linked underreporting. There are no significant differences by race in reported speeding behavior in a 35 mile per hour speed zone. In a 65 mile per hour speed zone, African Americans, on average, reported driving more than one mile per hour slower than Whites. Footnote 10 In neither case was there a significant interaction of accurate speed stop reports with race. Footnote 11 Thus, the race difference in self-reports of average speeds is unlikely to represent a social desirability induced reporting error.

These analyses lead to two conclusions. First, respondents who are truthful on the record check question are somewhat more likely to report higher rates of speeding but not other risky driving behavior. We interpret this to represent a tendency toward social desirability response effects in self-reports of speeding but not of other driving behaviors. To test this conclusion we ran an additional analysis of clearly non-threatening questions –self-reports of miles driven last week and last year. In neither case was there an association between self-reports of miles driven and admitting to a police stop. Thus, when the question was non-threatening there was no bias associated with a tendency to accurately self-report a speeding stop.

The second and more important conclusion is that the effect size of social desirability based under-reporting of other illegal driving (i.e. speeding) behavior is similar for whites and blacks. The significance of this finding is that the structure of race differences in response bias does not vary with question content.

Conclusions

Consistent with past research on self-reports, we find that most survey participants are willing to admit a negative official contact with the police, in this case a traffic stop and citation for a speeding violation. Also consistent with previous research is the finding that African Americans are more likely than Whites to give socially desirable answers to threatening survey questions. This result, although expected based on past research, is inconsistent with the expectation that African American citizens may be encouraged to over report police contacts due to any potential cultural freedom associated with being a “victim” of the DWB phenomena. African Americans do not over report police speeding stops. To the contrary, African Americans under-report such stop experiences at a slightly greater rate than White drivers. This tendency indicates that surveys of drivers designed to estimate the magnitude of the “Driving While Black” phenomena will tend to underestimate police stops for both African American and White drivers; however, the magnitude of error may tend to be greater for African Americans. Thus, survey reports of police stops will tend to underestimate the actual volume of traffic stops as well as the degree of race disparity in police stops.

We also find that respondents who fail to report police stops may also be more likely to provide more socially appropriate responses to questions on speeding behavior. We found, however, no evidence that African Americans who report or do not report stops are more or less likely than similar Whites to underreport risky driving behavior or speeding. The findings of this study also support the view that survey questions about risky driving behaviors are less threatening than questions about official contacts with police. The lack of a race by admitting to stop interaction in the social desirability results for self-reports of speeding suggest that there is no race linked threat to those types of questions either. These findings are consistent with the review of the self-report method in crime research by Junger-Tas and Marshall (1999) who suggest that the degree of embarrassment associated with a police encounter is likely to result in higher levels of under-reporting for questions about official contacts with longer term consequences than for questions dealing with minor acts of wrong-doing.

Survey based estimates of the magnitude of the “Driving While Black” phenomena are likely to underestimate the true degree of race disparity in police stops. In this reverse record check survey, we found that 72.3% of Whites and 62.2% of African Americans who had been stopped for speeding in the last year actually reported such stops. This suggests that self-reports of speeding stops by North Carolina Whites and African Americans will be underreported by about 28% and 38%, respectively. Self-reports of police stops from survey data might be adjusted upward to reflect these biases. Similarly, multivariate statistical analyses of police stops might be weighted so that those who report stops represent their expected proportion in the population. It is difficult to say exactly what these weights should be. If researchers have a reverse record check available that exactly matches the population in a general driver survey then record check based non-response rates might be used to generate these weights. The results from this record check survey would be most appropriate for weighting survey based estimates of police speeding stops in North Carolina around the year 2000. If additional reverse record check surveys were conducted in different areas of the US, we might be able to make comparisons among these studies to provide a range of non-response estimates that could be useful for sensitivity analyses of survey based estimates of race differences in self-reported police stops.

In the 2000 North Carolina Driver Survey, 18.1% of Whites reported any stop in the last year. The comparable figure for African Americans was 26.4%. The self-reports from that survey suggest that African Americans were 1.82 times more likely than Whites to be stopped in North Carolina in the year 2000. If we adjusted the white report upwards to take into account our estimates of under-reporting of any police stops (18.1/.767) the new estimate is that 23.6% of whites were stopped in the year 2000. Similarly for African Americans, the record check adjusted estimates is much higher at 37.3% (26.4/.708). The odds ratio for these record check adjusted estimates are that African Americans were 1.93 times more likely than whites to be stopped by the police in North Carolina in the year 2000. This suggests that these self-reports of police stops in North Carolina underestimate the actual race differences in odds of a police stop by about 12%.

The analytical significance of revised estimates based upon record check weights is perhaps more crucial in situations where race differences are small. For example, drawing once again from the 2000 North Carolina Driver Survey, 7.5% of whites and 8.3% of African Americans reported a North Carolina State Highway Patrol stop. If we compare the odds ratios of trooper stop likelihood, African Americans are 1.11 times more likely to report being stopped by a state trooper. Since we estimate that African Americans are 0.92 times less likely than whites to report a stop the real race difference in highway patrol stops might be closer to 1.21 (1.11/.92). In this example the odds ratio is nearly doubled when the race gap in stops is small.

We can repeat this exercise with national data. The self-report data on police stops from the Contacts between Police and the Public component of the 1999 National Crime Victimization Survey show 10.4% of white drivers and 12.3% of African American drivers reporting a stop in the last year (Langan et al. 2001). This works out to a black to white odds ratio of a self-reported stop of 1.21. If we use our North Carolina race specific estimates of non-reporting of police stops to weight the estimates from the 1999 National Crime Victimization Survey, then 13.6% of whites and 17.4% of African Americans may have been stopped nationally. Thus, the odds of an African American being stopped by a police officer may actually be 1.34 times higher than the odds of a White being stopped in the last year. If for the national population in 1999 the race specific underreporting of police stops was similar to what we found in North Carolina in the year 2000, then estimates of race disparity in police stops might be underestimated by about half. Footnote 12

We suspect, however, that this would be an overestimate. Drivers report about twice the level of police stops in the 2000 North Carolina Driver Survey compared to the Contacts between Police and the Public component of the 1999 National Crime Victimization Survey. The observed race disparity in the national data is less than half as large as the estimate from the North Carolina Driver Survey. These substantial differences may result from real regional differences in police and driver behavior. If both police activity and race disparity in police stops are higher in North Carolina than nationally, it may also be the case that race difference in social desirability effects are as well.

The contribution of this study is to provide a cautionary tale. Researchers relying upon self-report data should be aware that race differences (and perhaps other correlates of social desirability effects) in the validity of self-reporting are likely to impact survey data on police contacts. Further, comparisons of the North Carolina Driver Survey and the Contacts between Police and Public Survey suggest that patterns of underreporting may also be contextual. We recommend that researchers not use our estimates of race difference in reporting to adjust estimates of race gaps in police stops in other locales. It may be sufficient to treat most survey-based estimates of the race gap in police stops as potentially conservative. For example, Lundman and Kaufman (2003), also using the 1999 National Crime Victimization Survey, find that African Americans are more likely to report being stopped by the police. Given our findings it is likely that Lundman and Kaufman’s finding that African Americans are more likely than whites to be stopped is correct, but that their statistical estimates understate the actual size of the race disparity. While the self-report method offers one of the most important approaches to tackling the methodological problems of empirical studies of police stop practices, namely the inclusion of driver behaviors measures and the development of adequate benchmarks, the differential validity of self-reports by race should be factored into any conclusions about the existence and size of observed racial disparities.

The self-report method is an important and under-utilized part of the toolkit available to scientists, public officials, and the public in studies of citizen experiences and attitudes concerning police encounters. Self-report surveys have become a standard practice among local policymakers in local municipal and county level government. Conducting surveys to gauge citizen behavior, experiences with police, police behavior, and perceptions about police and the law is a logical next step in developing empirically informed policy. For most communities, self-reports are likely to be a more feasible method of gathering data about citizen behavior than alternative methodologies, such as video surveys. Enhancing our understanding of the strengths and limits of the self-report method is a contribution to the larger project to understand the dynamics that underlie police stop practices.