Introduction

Across all levels of government, the United States spends $283.6 billion on criminal justice (Bureau of Justice Statistics 2018). Split across numerous agencies and crime prevention programs, the U.S. devotes more than 2.5% of its GDP to combating crime (Chalfin 2015; U.S. Census Bureau 2007). With $80 billion devoted annually to incarceration alone, it is vital that criminal justice expenditures focus upon “smarter, fairer, and more effective” programs while minimizing their costs to society more broadly (Obama 2017: 813). Legislators depend primarily upon documented need and cost effectiveness to justify the allocation of fiscal resources (Rubin 2005; Smith and Jensen 2017), making estimating the costs of crime essential for maintaining government investment (see e.g., Department of Justice 2012). Consequently, beyond the effectiveness of criminal justice interventions, it can be their determined value that affects which programs receive support and which crime problems are prioritized.

How to best determine how much crime “costs” society has been debated for many years (Chalfin 2015). There are three main approaches: hedonic analysis, bottom-up, and contingent valuation (Dominguez and Raphael 2015). Hedonic pricing focuses on actual preferences as opposed to stated preferences (Linden and Rockoff 2008; Chalfin 2015), for example, estimating home prices based on factors such as the crime rate of the neighborhood (see Thaler 1978). Second, the “bottom up” approach attempts to count all individual costs involved with crime and then add them up; civil jury verdicts are often used (see Cohen 1988; Miller et al. 1996; McCollister et al. 2010). Third, the “top down” approach attempts to calculate the entire cost of crime from one source (see Cohen et al. 2010). One common “top down” method is a contingent valuation (CV) survey that asks respondents the maximum they would be willing to pay out of pocket to reduce crime (see Cohen and Piquero 2009; Cohen et al. 2010).Footnote 1 These estimated costs of crime, regardless of how they are generated, are often used in cost-benefit analysis for governments to decide how to invest their resources (see Schweinhart et al. 1992; Greenwood et al. 1994; Aos et al. 2011).

This use of cost of crime numbers for competitive policy choices increases the need for scrutiny of their estimation methods. As these estimates for the costs of crime are reliant on personal preferences captured by surveys, it is vital to understand how methods used to elicit cost estimates influence responses. To this end, this paper assesses whether respondent willingness to pay (WTP) is impacted by crime type, crime reduction percentages, program types and frames, as well as by considering how sensitive cost of crime formulae are to WTP garnered from a survey. This is one of the first studies in criminal justice to utilize open-ended data gathered from a survey using random assignment to a variety of crime type scenarios, which allows respondents to input any value as opposed to being limited to researcher-determined values. Drawing upon data from a representative sample of the United States, this study demonstrates that there is wide variation in open-ended WTP responses, that the wording of the survey only sometimes influences the responses given, and that methodological decisions can drastically influence the estimation of the costs of crime through flow on impacts on the calculation equation.

Challenges in Estimating Costs of Crime and Willingness to Pay

The inherent difficulty in placing dollar values on crime is that victimization is not a commodity (Dominguez and Raphael 2015). However, using contingent valuation (CV), researchers are able to place an implicit price on the cost of crime by asking how much individuals are willing to pay to reduce a crime by a certain percentage. This methodology can be described as “simply ask[ing] people how much they value a safe environment” (Dominguez and Raphael 2015: 611). Contingent valuation was initially created to tackle the “challenge of valuing public goods” (Mitchell and Carson 1989: 2), specifically in the environmental sphere after the Exxon Valdez oil spill in 1989 (see Carson et al. 1992; Hausman et al. 1995). The National Oceanic and Atmospheric Administration’s foundational report on CV provided some best practices, including that CV studies should use probability samples and that willingness to pay as opposed to willingness to accept should be assessed (Arrow et al. 1993). The idea behind CV was that because something like clean air is not a “good” sold on a market, economists could instead assess how people valued it by asking them how much they would pay for it. This is how CV earned its name – elicited WTP is contingent on a hypothetical market.

After selecting CV as the methodology to assess cost, there are further choices to make. Most CV studies using a close-ended “payment card” methodology proceed by assigning respondents to varying monetary values (for example, $0, $50, $100, etc.), and asking them whether they are willing to pay that amount to reduce a certain crime by a given percentage (see Ludwig and Cook 2001). Researchers have broad discretion to shape these surveys, as they select both the values included in the payment card and the amount of crime reduction that respondents would hypothetically be paying for. Studies using this method then continue until a respondent states that they would not be willing to pay that much, and the survey is manipulated to then determine the highest value for which a respondent reports that they would be willing to pay.

As Chalfin (2015) notes, the ability to assess harms caused by crime still challenges researchers. It is also critical to note that crime and criminal justice expenditures must be balanced with everything else the government is responsible for, including law-making, public works, education, and health. As such, methodologies that produce over or under estimates of the costs of a crime to society may result in a disproportionate amount of public funds being devoted to interventions. Each crime that occurs can cause significant damage to the victim and society as a whole. However, without economic justification, political ideologies can have a disproportionate influence on responses to crime (Phelps 2016) and programs that seek to prevent crimes through policing, treatment, or incarceration can cost millions of dollars (Piché 2015). As Chalfin (2015: 1) succinctly states, “in short, crime is costly but so is crime control” and there can be further opportunity costs stemming from poor policy decision-making.

CV is a widely-used method for assessing non-market goods for several reasons. Its main advantages for estimating the costs of crime are that it is a direct measure that allows for a forecasted measure of crime costs, while “bottom up” approaches require the use of figures based on crimes that have already occurred. As Soares (2015: 125) describes however, CV does not require a “decomposition of different types of costs.” Further, CV’s history in other disciplines provides useful guidance for criminal justice researchers (Chalfin 2015). There are, however, limitations of contingent valuation (and other cost valuation or cost-benefit analysis methods). The use of CV is still considered somewhat controversial and an entire issue of Criminology and Public Policy was recently devoted to the topic of assessing costs of crime (see Nagin 2015; Dominguez and Raphael 2015; Aos 2015; Black et al. 2015; Manski 2015; Tonry 2015; Welsh and Farrington 2015). One main concern is that as with any survey, respondents may use a variety of strategies to manipulate it (Diamond and Hausman 1994). Some may report artificially high WTP (that they could not in reality afford) in order to garner support for a program they favor (Chalfin 2015). Contingent valuation (open or close ended) is also constrained by the fact that it is a survey with some pre-set parameters, which could potentially anchor WTP responses (Diamond and Hausman 1994). In particular, respondents are asked for their WTP for some ambiguous “marginal gain” with regard to a specific amount of crime reduction (Soares 2015: 127).

Contingent valuation is also considered a “stated preference” approach, as opposed to a “revealed preference” approach. This distinction means that researchers are asking for hypothetical preferences; in reality, respondents are not being asked to spend any money (Carson & Haneman 2005; Soares 2015). Moreover, early critiques noted that WTP can be insensitive to variation in judging the impact of the intervention, whereby, respondents may be stating their vague willingness to contribute to broad goal without understanding the magnitude of any impacts (Dominguez and Raphael 2015; Diamond and Hausman 1994). Black et al. (2015) further argue that both hedonic and CV methods fail to consider the fact that some policies lead to arrest, detention, and prosecution of innocent individuals. Similarly, Manski (2015) posits that current cost-benefit analysis (CBA) methods, including CV, do not adequately account for equity concerns across income levels and WTP responses. Despite these concerns, some still advocate for using CBA in policy-making, though often with caveats (Aos 2015; Dominguez and Raphael 2015; Cohen 2015; Welsh and Farrington 2015).

Another less frequently used method is open-ended (OE) CV. As mentioned above, OE CV is often used in an attempt to avoid bias introduced by the previously-determined values attached to payment cards. In an OE CV survey, respondents are presented with an open-ended question asking them a question such as: “what is the highest you are willing to pay to reduce the crime in question by 25 percent.” To our knowledge, OE CV has only recently been applied in a criminal justice context (see Galvin, Loughran, Simpson & Cohen 2018), though it has previously been applied elsewhere (see Yu and Abler 2010 for an example of OE CV survey for clean air in Beijing). Thus, to our knowledge, there are no existing cost of crime estimates using OE CV. In OE CV surveys, respondents are provided information regarding the crime type, its frequency, and a proposed program to reduce it by a certain percentage. Then, respondents are asked to indicate the most that they would be willing to pay annually on behalf of their household if that option was adopted in their state; there are no upper or lower limits on response values.

Open-ended CV methods have the advantage of allowing respondents to place their own values on crime control without the potential survey methodology problems such as anchoring, where a respondent bases their response partly on the value they are first presented with (see Diamond and Hausman 1994). However, open-ended elicitations may “produce substantial distortions in the response obtained” and may result in less precise and more rounded estimates (i.e. respondents may “round” a response to $10 instead of $12) (Carson and Hanemann 2005: 893). Galvin et al. (2018), in a recent paper focusing on preferences for victim compensation, used the same dataset as the current paper. Instead of using WTP as a raw figure, they “circumvent” the complex issue of using WTP to estimate costs of crime by instead using relative WTP to rank preferences (Galvin et al. 2018: 563). As such, the present study seeks to examine the potential for OE CV to address the aforementioned biases and provide the first assessment of its value in a criminal justice context.

Contingent Valuation in Criminology and Criminal Justice

Close-ended CV studies have been used frequently in criminology and criminal justice and as we state above, there has not yet been an attempt to estimate costs of crime using the alternative open-ended approach. Below, we first discuss relevant willingness to pay studies and then review those which have extended WTP to estimate costs of crime; each of these was estimated using close-ended approaches. Zarkin et al.’ (2000) mall-intercept survey assessing WTP for drug treatment programs found that the scope of benefit offered did not significantly impact respondents, as average WTP was the same for either 100 or 500 individuals being treated by the proposed program. Similarly, Atkinson et al. (2005) used CV to study intangible costs of three different violent crimes in the United Kingdom. Not surprisingly, WTP was highest for serious wounding relative to other (less serious) wounding and common assault. In addition, results revealed that there were significant numbers of protests (people willing to pay nothing), a few high outliers skewing the distribution, and that many individuals found it challenging to state their WTP.

In a seminal study using a random sample of households in Pennsylvania, Nagin et al. (2006) found that WTP was approximately 25% higher when juvenile programs were rehabilitative as opposed to punishment-based, supporting the argument that program type significantly impacts how much people are willing to pay for it. Piquero and Steinberg (2010) extended Nagin et al.’ (2006) findings to 4 states (Pennsylvania, Illinois, Louisiana, and Washington). Their results further established that there was a general preference for rehabilitation, but also identified variation across states. Recently, Picasso and Cohen (2019) used CV discrete choice analysis to examine homicide and the value of a statistical life in Buenos Aires. They identify the advantages of CV over “revealed preferences,” namely that people can identify their preferences outside of their own personal risk and they can evaluate programs that have yet to be implemented. Picasso and Cohen (2019) also note the potential pitfalls of OE CV, namely that an upward bias is possible. Despite this concern, OE CV is a potential method for assessing costs of intangible goods including the costs of crime.

Cost of Crime Studies

Both open and close-ended WTP methods share the same formula for producing final cost of crime estimates, though as we stated above, WTP data elicited from OE CV has yet to be applied to generate costs of crime. After eliciting respondents’ maximum willingness-to-pay (WTP) using either method, “costs of crime” are calculated using the formula below (see Cohen 2015):

$$ Co\mathrm{s}t\ of\ Crime=\frac{Stated\ WTP\ x\ Number\ of\ Households}{Number\ of\ Crimes\ Avoided} $$
(1)

With this formula, a “cost of crime” is generated using three inputs: 1) respondent stated WTP; 2) number of households in the United States; 3) number of crimes avoided. First, the WTP in this formula refers to a survey respondent’s maximum reported WTP. This value is typically the mean value for the entire sample. Second, the number of households is usually taken from the U.S. Census (see Piquero et al. 2011; Cohen 2015). Third, the number of crimes avoided is calculated by multiplying the percentage crime reduction presented to the survey respondents and the estimated incidence of the specific offense in one year (Cohen 2015). As one can see, this formula requires several inputs that may be difficult to estimate, and relies on official crime report statistics that may not be readily available or accurate, depending on crime type.

Estimated crime incidence is taken from different outside sources, depending on the crime type (see Piquero et al. 2011). For “street crimes” such as burglary, Uniform Crime Reports or National Crime Victimization Survey statistics are readily available and used prevalently in U.S. studies (see Piquero et al. 2011). However, these data sources have significant shortcomings for understanding the prevalence of white-collar crimes, as they focus on traditional street and violent crimes. The most recent estimates of consumer fraud by the Federal Trade Commission were published in 2013 (Anderson 2013). So, for example, for financial fraud in the current survey, the estimated incidence was 4.1 million based on the most recent report (Anderson 2013).Footnote 2

After generating cost estimates, researchers and policy-makers frequently then perform benefit-cost analysis (BCA) to determine which crime prevention techniques and programs will get the most “bang for the buck” (Aos 2015; Welsh and Farrington 2015). Local, state, and federal governments perform these BCA analyses as they determine where to invest their funds (Schweinhart et al. 1992; Aos et al. 2011, Institute of Medicine and National Research Council 2014). Recently, the Department of Justice used previous WTP estimates of the cost of rape as part of an assessment of the Prison Rape Elimination Act (Department of Justice 2012) and outside of the criminal justice context, WTP has also been used to estimate general public spending priorities (see e.g., Koford 2010). As policymakers are always concerned with utilizing money in a way that best serves the safety of the community while balancing other legislative priorities, it is imperative that the “costs” included in a BCA analysis are accurate reflections of true costs of crime.

CV methods have also been used to generate “costs of crime” for street, violent, and white-collar crimes. One of the most well-known examples is Ludwig and Cook’s (2001) use of CV to estimate the costs of gun crime in the United States. Using payment card methodology, they asked a representative sample about how much they would be willing to pay to reduce gun violence in the United States by 30%. They found that as the cost of the hypothetical program increased, higher numbers of people were not willing to pay. For example, 75.8% of the sample reported that they would vote “yes” on a program that would cost $50 a year in taxes to reduce gun violence by 30% and only 63.6% reported “yes” for a program that would cost $200. Ludwig and Cook (2001) estimated a $1.2 million cost of each injury due to gun violence.

Cohen, Rust, Steen, & Tidd (2004) published an influential paper using CV to estimate costs of various violent crimes. Using a nationally representative payment card survey, they asked respondents how much of their household income they would be willing to devote to a program to reduce a specific crime by 10%. Using WTP, cost of crime estimates were generated, ranging from $25,000 for burglary to $237,000 for rape/sexual assault, and $9.7 million for murder (Cohen et al. 2004). Similarly, Piquero et al. (2011) used a “top down” approach to estimate WTP for identity theft; this was the first attempt at valuing a white-collar offense using WTP methods. Respondents in their survey were willing to pay $73–$87 based upon a 25%–75% reduction in identity theft. Researchers then used these figures to estimate a cost of approximately $1500–$3800 per identity theft in the United States.

In an extension of previous work, Cohen (2015) has also applied these methodologies to both violent and white-collar crimes beyond identity theft. In total, Cohen (2015) elicited WTP for seven crimes, including murder, rape, consumer fraud, financial fraud, and burglary. Each respondent was presented with the same crime percentage reduction of 10%. Average WTP was lowest for corporate crime ($51.14) and highest for rape ($85.73). Using the formula described above, costs were estimated to be the lowest for consumer fraud ($1200) and highest for murder ($6.5 million). Cohen (2015) also analyzed whether WTP differed based upon respondent characteristics and found some differences; for example, male respondents reported higher WTP than females. Cohen (2015) concluded that WTP estimates elicited from surveys are frequently much higher than actual monetary losses attributed to victims from these crimes, and that regulators should take heed of this to properly perform BCA of crime control programs.

Taken together, extant literature demonstrates that CV methods can produce vastly different outcomes and there is significant variation within the population in WTP for crime control programs. Is also yet unclear whether the potential crime reductions offered in each study influences the response (see Dominguez and Raphael 2015) and this crucial survey decision has the potential to inflate or deflate final cost of crime estimates. In light of these observations, the present study was designed to clarify these issues. Below, we discuss the sampling, survey, and outside data used for the present study. With this additional context on survey design, we then introduce our hypotheses and describe the analysis methods.

Sampling, Survey Instrument, and Additional Data Sources

Sampling

Data for this paper were acquired from a broader research project on white collar crime costs and contingent valuation. Data was collected via an online survey administered to a KnowledgePanel® through an agreement with Growth from Knowledge (GfK), a professional research group. The Knowledge Panel® sample consists of households and individuals who are recruited and maintained by GfK for participation in a range of focus group and surveys.Footnote 3 Our sample was designed to be statistically representative of the U.S. population, with oversampling of Hispanic respondents. Eligible respondents could complete the survey in English or Spanish, were aged 18 or older, and were a resident of the United States. The final survey dataFootnote 4 were solicited and collected between May 28 and June 14, 2015. Individuals who did not respond to the initial survey invitation were reminded on the third, seventh, ninth, fourteenth, and sixteenth day to encourage maximum response. A total of 2050 respondents completed the survey, for a response rate of 49–56%.Footnote 5 This strategy yielded a final sample that is largely representative of the United States population. Data can be accessed through the National Archive of Criminal Justice Data (Simpson et al. 2015).

Instrument

There were three main components to this survey: a rating of crime seriousness, questions regarding actual and perceived victimizations of crime, and the WTP portion, which is utilized most in this paper. Four crime types were included in the WTP section: consumer fraud, financial fraud, identity theft, and burglary. These crime types were selected to provide insight into a range of different white-collar crimes and to provide a comparable baseline for comparison with previous studies. Respondents were prompted that:

All of these programs require additional money to implement and would require either raising taxes or reducing other government services. We want you to think about the proposed programs and assume that these programs have been shown to work and will reduce crime. We also want you to answer each question as if you actually would have to pay the amount you enter in the survey.

Respondents viewed four WTP scenarios, one relating to each offense type - financial fraud, consumer fraud, identity theft, and burglary. The order in which these crimes were presented to respondents was randomly assigned.

Program Randomization

For each crime type, respondents were provided a brief description of the crime, as well as the annual incidence and average loss from victimization in the U.S. Three types of programs were presented, in a number of combinations: “victim repayment for losses” (restitution), “more police and longer prison sentences” (deterrence) and “teach[ing] potential victims” (education). A total of six programs were considered for each crime type. The six programs varied in their inclusion of these three program components (restitution, deterrence, and education) as well as in the level of crime reduction provided by the program. Figure 1 details these program types and describes the program randomization procedure.

Fig. 1
figure 1

Program Randomization Description

All respondents provided estimates for programs A and B. However, half of the sample was randomly assigned to provide WTP for Programs C and D, and half provided WTP for Programs E and F. Thus, each respondent provided WTP estimates for four program options for each of the four crime types (16 in total). After each of the four program descriptions, respondents were asked to indicate the maximum that they would be willing to pay annually on behalf of their households for each program if that option was adopted.Footnote 6

Scope Randomization

Respondents were also randomized on two additional dimensions: level of crime reduction (referred to as the “scope” of the question) and frames. Two different options were randomized for the scope aspect. For crime reduction, 70% of the sample viewed program A and B as reducing crime by 50% and either C and D or E and F as reducing crime by 25%. The remaining 30% of the sample viewed program A and B as reducing crime by 25% and either C and D or E and F as reducing crime by 10%.

Frame Randomization

In the introduction to the survey, respondents were randomized into three categories: no frame, a “vulnerable victim” frame, and an “individual protection” frame. Approximately 33 % (n = 664) of the sample viewed no frame. Approximately 35 % of the sample (n = 718) was presented with the following frame (referred to as the “vulnerable victim” frame): “Certain vulnerable populations are at higher risk of becoming victimized. Depending upon the type of crime, vulnerable victims might include senior citizens and individuals who are in some type of financial distress.” The remaining approximately 33 % of the sample (n = 668) was presented with the following “individual protection” frame: “There are certain steps that individuals might take to protect themselves against these crimes. For example, to reduce the risk of burglary, they might purchase burglar alarms or install better lighting. To reduce the risk of identity theft, they might frequently change their passwords or choose not to purchase goods online. All of these steps that people take to protect themselves involve spending time and money.”

Additional Data Sources

To estimate costs per crime, outside data were used. The number of households in the United States was gathered from the U.S. Census Bureau website. In line with prior research, varying sources were used for different crime types. Financial fraud and consumer fraud incidence were drawn from Federal Trade Commission’s most recent report on Consumer Fraud (Anderson 2013). This report was published in 2013 and the estimates included in the report are for 2011. Identity theft prevalence was gathered from the BJS, which included data from the National Crime Victimization Survey (Harrell 2015). Burglary was also gathered from the NCVS and included only household burglary.Footnote 7

Hypotheses

The purpose of this paper is to assess, with new and nationally representative data, the validity of using open-ended willingness to pay survey responses to generate costs of crime. We have six hypotheses in this paper, several with relevant sub-hypotheses. The first five explore how WTP is impacted by outliers, crime type, crime reduction, program type, and framing. The final hypothesis ties these together by inputting mean WTP into the common cost of crime formula to examine how the underlying WTP distribution affects cost estimates.

First, we examine how various data transformations affect WTP. As mentioned above, OE CV presents a few challenges to researchers, including high outliers. One recommended approach is to “Winsorize” values.Footnote 8 We expect that performing this transformation at various appropriate alpha-levels will significantly change the distribution of WTP values and reduce the disproportionate influence of extreme responses.

Hypothesis 1a: Raw WTP will be significantly higher than WTP Winsorized at .01.

Hypothesis 1b: Raw WTP will be significantly higher than WTP Winsorized at .05.

Second, we explore how average WTP may vary across crime type. Expanding upon Cohen’s (2015) observation that WTP was lowest for consumer fraud, we hypothesize that a non-white collar offense (burglary) will have the highest WTP. Further, we predict that WTP will vary between the 3 white-collar offenses included in the current study.

Hypothesis 2a: WTP will be higher for burglary than for white-collar crimes.

Hypothesis 2b: There will be differences in WTP across white-collar crime types.

Not all crime interventions yield the same reductions in crime and previous literature has found that survey respondents are not very responsive to described crime percentage reductions (Zarkin et al. 2000; see also Dominguez and Raphael 2015). Following these findings, we expect that respondents randomly assigned to higher promised reductions in crime will be willing to pay the same as individuals who viewed lower reductions.

Hypothesis 3: There will be no statistically significant differences in respondents’ WTP for larger reductions in crime.

Next, we assess whether certain program types elicit higher WTP. Previous work (Nagin et al. 2006; Piquero and Steinberg 2010) has found that respondents reported higher WTP for rehabilitation programs as opposed to deterrence-based programs. Further, Galvin et al. (2018) recently found that respondents supported restitution. Thus, we predict that respondents will be willing to pay more for a restitution or education component as opposed to deterrence. In addition, the current survey provides a unique avenue to explore whether it is not the characteristics of the components themselves, but rather the overall quantity of components.

Hypothesis 4a: WTP will be higher for education or restitution programs than deterrence programs.

Hypothesis 4b: WTP will be higher for programs that have more program components.

We further seek to explore framing effects that can also influence WTP (Nagin et al. 2006; Ajzen et al. 2000). Building upon the recent examination by Galvin et al. (2018) that explored how framing can alter rank-ordered WTP preferences, we extend their research by assessing whether frames directly affect raw WTP.

Hypothesis 5a: Respondents will express higher WTP when they view the vulnerable frame compared to no frame.

Hypothesis 5b: Respondents will express lower WTP when they view the individual responsibility frame compared to no frame.

Lastly, we use Formula 1 (see above) to calculate costs of consumer fraud, financial fraud, identity theft, and burglary. Based upon the data and our predictions in Hypotheses 1a and 1b, we expect that Winsorizing WTP data will affect final cost of crime figures.

Hypothesis 6: The average “cost of crime” will be greater using raw WTP data compared to using Winsorized WTP data.

Data Analysis Methods

To test these six hypotheses, mean WTP is calculated using nonparametric methods. Respondents’ average WTP is calculated by including all responses and dividing by the number of respondents included in the sample. Due to the non-normal distribution of WTP (for example, 961 respondents reported zero WTP for the first scenario and there are a few very high outliers; see Fig. 2 below), these estimates are then Winsorized. Levy et al. (1995) describe a Winsorized mean as a statistic that “adjusts the values of the most extreme observations toward the center of the distribution” (Levy et al. 1995: 25). Essentially, all individuals who stated a WTP either below or over a selected threshold will have their observed WTP set to this amount. This method is preferable for asymmetric distributions (as displayed in Fig. 2) and is less vulnerable to bias than merely trimming the top (and/or bottom) values of the distribution (Levy et al. 1995). As can be seen in Fig. 2, due to the number of zeroes in the distribution, the only responses that are altered are the highest values which were recoded to 1000 for the 99% Winsorized values and 200 (11 responses recoded) for the 95% values (92 responses recoded). This transformation was performed using the STATA program “winsor.”Footnote 9 This program and command take any non-missing values of a variable that are either lower or higher than a set cut point (h) and creates a new variable that is identical to the original value except that the h-highest and h-lowest values are replaced by the closest value counting inwards towards the mean.

Fig. 2
figure 2

Distribution of values for Program A, Financial Fraud: raw data, 99% Winsorized data, and 95% Winsorized data

For the analysis, means will be presented as Winsorized means using both 1% and 5% thresholds. For the 99% Winsorized means, all values in the top 1% of the distribution were replaced by the next-lowest value, and all values in the bottom 1% of the distribution were replaced by the next-highest value (which in this case, is zero across all distributions). The same procedure was used for the 95% Winsorized means, where .05 was the cut point. This transformation was completed for all scenarios and all crime types. Winsorized means are sensitive to the number of observations that are Winsorized (Kerr et al. 2003). With this understanding, all analyses will be performed with both 99% and 95% Winsorized values.Footnote 10

To test the hypotheses 1–5, t-tests are utilized to determine whether there are significant differences for respondents who viewed a certain crime reduction, program, or frame. Finally, to generate the costs of crime and test hypothesis 6, Formula 1 (above) is used.

Results

Sample Descriptive Statistics

As mentioned above, this sample is largely representative of the U.S. population. The average age was approximately 49 years, with a range of 18–94. There was a fairly even representation of gender, with males representing 49% of the sample and females 51%. The majority of the sample (62%) identified as white, 24% as Hispanic, and 7% as non-Hispanic black. Fifteen percent reported less than a high school degree, 28.5% reported a high school diploma as their highest degree received, 29% reported some college, and 27.5% have earned a college degree. Income was reported in 19 categories. Approximately 52 % reported a household income of less than $60,000 per year, while only 2.4% reported earning less than $5000, and 4.6% reported more than $175,000 per year. Further, 47% responded that they were working as a paid employee and 21% were retired; 20% reported not working due to retirement and 17% were not working due to disability or another reason. The majority of respondents were married (58%), with 19% reporting never having been married.

Willingness to Pay and Crime Type

The first set of results we present are the WTP figures, or the amount that respondents were willing to pay for the crime control programs they viewed. Table 1 below presents mean WTP (raw, Winsorized at .01, and Winsorized at .05) for all 6 program types and 4 crime types.Footnote 11 As illustrated by Table 1, one can see that the minimum value is zero for all options and there are right-skewed outliers for each scenario. These right-skewed outliers range from $1000 for Program F for financial fraud to as high as $99,999 for Program B for 3 of the 4 crime types. These outliers persist throughout the results, suggesting that Winsorized means are likely able to reduce bias stemming from individual extreme responses. Looking to the far right column for the 99% and 95% Winsorized values of Table 1, there is support for Hypothesis 1a and 1b, which predicted that raw WTP would be higher than Winsorized WTP. This hypothesis is supported through the statistically significant differences in 17 out of the 24 tests (70.83%) for the 95% Winsorized values, and in 8 out 24 tests (33.33%) for the 99% Winsorized values. It is important to note that though not every difference is statistically significant at α=0.05, this is likely due in part to the high skewness and large standard deviations for each mean. This suggests that 95% Winsorized means have greater value than 99% Winsorized means for reducing the influence of extreme values for WTP estimates derived from open-ended survey instruments.

Table 1 Raw WTP compared to Winsorized WTP, all crime and program types

Next, a series of t-tests were performed to assess Hypothesis 2a and 2b: WTP will be higher for burglary than for white-collar crimes and there will be differences in WTP across white-collar crime types.Footnote 12 A total of 18 t-tests were relevant for each hypothesis, for each data transformation. For the 95% Winsorized means in the left-hand column of Table 2, Hypothesis 2a was supported in 7 of 18 comparisons (38.9%), providing partial support for our hypothesis. However, looking to the 99% Winsorized WTP, there is more limited support; only 4 of 18 t-tests (22.2%) indicated that respondents were willing to pay significantly more for burglary as compared to other crime types. It should be noted, however, that the vast majority of comparisons were in the hypothesized direction, with WTP being numerically greater for burglary than for white-collar crimes in most cases.

Table 2 Crime Type Comparisons, by Program

We also find partial support for Hypothesis 2b which predicted differences across white-collar crime type. For the 95% WTP values, 7 of 18 (38.9%) comparisons indicated significant differences; this number lowers to 2 of 18 (11.1%) for the 99% WTP values. Looking in more detail, however, there are clear patterns in which crimes consistently yielded higher or lower WTP. Financial fraud WTP was the lower figure in each comparison. On the other hand, when examining the 9 significant differences across all of Table 2 values, 5 are higher for identity theft and 4 are higher for consumer fraud. In sum, our predictions regarding crime type differences are partially supported and the level of support was dependent upon how the outliers were treated.

Scope Tests

Next, we move to our third hypothesis which predicted there would be no significant differences for WTP across the random variations of crime reduction presented. Because the formula for calculating cost of crime with CV WTP includes the percentage of crime reduction presented in the survey (see Formula 1 above), this is incredibly important. To test whether the scope of the question affected stated WTP, a series of t-tests were performedFootnote 13 comparing the mean WTP for those individuals who were presented with a 50% reduction to those presented with a 25% reduction. For this analysis, results from programs A&B are presented in Table 3 below.Footnote 14 One can see in Table 3 that the percentage of crime reduction presented to respondents did not overall significantly affect their WTP in the expected direction. Indeed, though not statistically significant, using the 95% Winsorized values, respondents presented with a 25% reduction for Program A for consumer fraud were actually willing to pay $3 more per year than those who viewed a scenario offering a 50% reduction.

Table 3 Crime Reduction Comparisons, Programs A and B, by Crime Type

Using 95% Winsorized means, 2 of the 8 comparisons were significant in the expected direction and only for burglary; however, these differences did not emerge as statistically significant when a more conservative Winsorizing function (.01) was utilized. For both Program A and B for burglary, respondents reported higher WTP for a program that promised a 50% reduction in crime. As predicted in Hypothesis 3, findings in Table 3 indicate that the majority of the comparisons failed the scope tests and that respondents in this survey were not very reactive to the amount of crime that would be reduced with a specific program.

Program Type Differences

Next, we move to our fourth set of hypotheses, which predicted that WTP would be higher for education or restitution as compared to deterrence and also predicted that more program components would be associated with higher WTP. As discussed above, each respondent was presented with four of the six program types. Every respondent viewed Programs A and B; half viewed C and D and the other half viewed E and F. With regard to Hypothesis 4a: WTP will be higher for education or restitution programs than deterrence programs, we do not find support. There were a total of 16 relevant t-tests for this hypothesis; we compared Program C (Restitution, Deterrence) to Program D (Restitution, Education) and program E (Deterrence only) to Program F (Education only). Looking to the bottom 2 sections of Table 4, none of these contrasts exhibited statistically significant differences.

Table 4 Program Type Comparisons

In contrast, we find strong support for Hypothesis 4b, which predicted that respondents would be willing to pay more for programs with more components (see Table 2, top section). For every comparison using both 95% Winsorized and 99% Winsorized WTPs, average WTP for each crime type for Program A (which included 3 components: Restitution, Deterrence, and Education) was higher than for Program B (which included only two components: Deterrence and Education). When comparing programs that included the same number of program components, the actual constituent parts did not predict WTP differences. These respondents instead seem to be swayed by the number of components in a potential crime reduction program.

Framing Effects

Our fifth set of hypotheses explored framing effects. In Hypothesis 5a, we hypothesized that viewing a “vulnerable victim” frame would be associated with higher WTP and that in Hypothesis 5b, we predicted that viewing an “individual responsibility” frame would be associated with lower WTP. Looking to Table 5, one can see that neither frame affected respondent WTP in the expected direction. Looking to the right-hand column of Table 5 and the vulnerable victim frames, there were no significant differences in the expected direction in WTP for financial fraud, consumer fraud, identity theft, or burglary. We predicted that viewing this frame might elicit a higher WTP because people might feel more empathetic towards more vulnerable populations. On the contrary, we observe that respondents reported lower WTP when framed as compared to when they were not framed. Similarly, for the individual protection frame, observed differences were in the opposite direction of predictions. Framed respondents were actually often willing to pay more for programs to crime as compared to individuals who viewed no frame. This is surprising, given that we expected that individuals who viewed this frame might be less willing to pay due to the reminder that individuals can take certain steps to protect themselves and potentially prevent being a victim of crime.

Table 5 Frame Testing, Programs A and B

Costs of Crime

The last step in the analysis was to estimate the “costs per crime” using WTP numbers from survey respondents. We predicted in Hypothesis 6 that costs would be higher when calculated using raw WTP as compared to Winsorized WTP. Researchers in this field have used CV methodologies to construct costs of crime and the formula used in this paper is the same utilized by other researchers in this field (see Cohen 2015).

$$ Cost\ of\ Crime=\frac{Stated\ WTP\ x\ Number\ of\ Households}{Number\ of\ Crime s\ Avoided} $$

The number of crimes avoided is calculated by multiplying the incidence of that particular crime in a given year and the percentage of crime reduction presented in the survey. For this portion of the analyses, we estimated costs using raw, 95%, and 99% Winsorized means in order to assess how data transformation affects cost estimates. We were particularly careful here because for the well-accepted formula for cost of crime, the crime reduction (entered into the formula as part of the “crimes avoided” denominator) is critical. Because of this, we calculated costs of crime for both 25% and 50% reductions and then created a weighted average from the two, based on the number of respondents who viewed each percent reduction. From the results on Fig. 3, one can see that this formula is particularly sensitive to its inputs and that the crime reduction influenced final cost considerably.Footnote 15

Fig. 3
figure 3

Cost of Crime Estimates, Program A, for all 4 crime types

First, the cost of a financial fraud for Program A, 50% reduction in crime using raw WTP, 95% Winsorized WTP, and 99% Winsorized WTP is (respectively) $4485.02; $3281.41; and $2007.17. These figures change drastically when using WTP for respondents presented with a 25% reduction. They balloon to (in the same order as above): $12,231.77; $6389.14; $4167.88. These tremendous ranges are due to the findings for Hypothesis 3 that respondents were not affected by the crime reduction presented in the survey. WTP did not vary significantly across crime reduction, but because reduction percentage is a key input for the cost of crime (see Formula 1), final cost numbers are affected dramatically. The range for the cost of financial fraud based upon calculations presented in Fig. 3 is approximately $10,000: from $2000.17 (50% reduction, 95% Winsorized WTP) to $12,231.77 (25% reduction, raw WTP).

A similar pattern emerges with other crime types. For consumer fraud, the cost of crime with is at its lowest at $254.41 for a 50% reduction (using 95% Winsorized WTP) and at its highest at $1229.36 for a 25% reduction with raw WTP. Identity thefts “cost” more than consumer fraud overall, with a lower estimate of $639.20 for a 50% reduction with 95% Winsorized WTP and a higher estimate of $2603.09 for raw WTP for respondents viewing the 25% reduction. Lastly, the costs of crime for burglary are the highest for all crimes in this study and they also have a sizeable range. The lowest estimate for the cost of one burglary is $3635.19 when applying the 95% Winsorized average WTP with a 50% reduction and the highest is $18,278.54 when using the raw WTP at a 25% reduction.

One can see from each crime estimate in Fig. 3 that the final number depends greatly on the crime percentage reduction presented in the survey and also how the outliers in the data were treated. We find, then, strong support for Hypothesis 6, which predicted that the average “cost of crime” will be greater using raw WTP data compared to using Winsorized WTP data. In our final section below, we summarize our results, discuss how they fit into extant body of work, and provide suggestions for future research and policy.

Discussion and Conclusion

With this paper, we seek to add to the growing literature confronting the essential and challenging task of estimating how much crime “costs.” Using a nationally-representative survey with random vignette presentation, we assessed how WTP responses vary across data transformation techniques, crime type, crime reduction, program type, and framing. Using WTP, we then calculated costs of crime using an established formula and examined how these “costs” might be sensitive to bias or skewness in the underlying WTP data. Overall, results indicate that survey manipulations only sometimes affected respondent WTP for crime control programs and that using WTP to estimate costs of crime exposes the frailty of the formula itself.

First, when looking to the raw WTP, there are consistencies across distributions. For each of the four crime types and the six programs, there were significant outliers, sometimes as high as $99,999. While using OE CV methods allows respondents more flexibility and reduces the potential influence of researchers pre-selecting payment card values, it also opens the door for nonsensical responses (Diamond and Hausman 1994; Carson and Haneman 2005). Due to this, we chose to use Winsorized values that replace a certain percentage of both the high and low end of the distribution with the value at the “cut point” chosen by the researcher (see Levy et al. 1995). We found support for our first hypothesis, which predicted that Winsorized values would be significantly lower than raw WTP values (see Table 6 for a brief summary of findings). For example, looking to Program A for financial fraud, the choice of 5% or 1% also made a significant difference (see Table 1). With no Winsorizing, the average WTP was $86.87; this number shrank to $55.24 for 99% Winsorized WTP and $35.06 for 95% Winsorized WTP. This trend was present and consistent for each of the crime types and program types, though all differences were not significantly different at α=0.05 (perhaps partially due to the high skew and thus high standard deviations in the data). Next, we examined a variety of factors and survey manipulations to see which affected respondent WTP.

Table 6 Summary of Hypotheses and Support

We found partial support for our second set of hypotheses focusing on differences across crime type. In general, respondents were willing to pay for more burglary than for many of the other white collar offenses, with the exception of identity theft. Identity theft is a crime that has been receiving more media attention lately (e.g. Kristof 2018) and as a corollary to burglary, also indicates an invasion of personal information. This type of invasion might be particularly salient for survey respondents, leading to increased WTP. Third, based on previous literature suggesting that respondents were not overly affected by the scope of crime reduction (Zarkin et al. 2000), we hypothesized that respondents would be willing to pay the same for a 50% and 25% promised reduction in crime. In general, this prediction were supported; it does not seem that respondents are looking to the crime percentage very closely when determining how much of their own income they would be willing to pay. This is in line with Diamond and Hausman’s (1994) now 25-year-old supposition that respondents may be willing to pay for a nebulous program, but that they do not truly understand the impacts of the program. An alternative hypothesis is that respondents may understand the impacts, but are not affected by them.

Fourth, we examined whether program type affected WTP. Despite Nagin et al.’ (2006) findings that rehabilitative programs elicited higher WTP than deterrence programs and Galvin et al. (2018) recent study demonstrating support for restitution, we found little support for our hypothesis that people would be willing to pay more for education or restitution relative to deterrence. Conversely, we found strong evidence that people are willing to pay more for programs with more components. A potential explanation for this finding is that respondents are not looking to the mechanism of the programs, but simply to the number of components; they may simply be thinking that “more is better.” This finding is particularly curious when contrasted to the findings (or lack thereof) for the scope test. Though a higher crime reduction is not persuasive to respondents, more program components seems to be. Fifth, we found no support for our hypotheses for framing and instead some findings were in contrast to predictions. As Galvin et al. (2018) noted, it is possible that a “personal responsibility” frame makes victims seem more human as opposed to highlighting their ability to protect themselves. Similarly, WTP for identity theft was again in an unexpected direction; those who viewed the vulnerable victim frame were willing to pay less than those who did not.

Lastly, this paper explored the potential sensitivity of the popular “cost of crime” formula and found that the method generates fragile estimates. In support of Hypothesis 6, our findings indicated a wide range of final costs of each of the crimes, depending on what percentage of the values were Winsorized and the percentage of crime reduction presented. Looking to Fig. 3, the estimated cost of a financial fraud ranges from $2000.17 to $12,231.77 and similarly, the range for burglary is massive ($3635.19- $18,278.54). Our analyses point to several issues with this traditional cost of crime methodology. First, as mentioned above, we found significant outliers for each crime type and program type. We believe that the best way to deal with these outliers is by Winsorizing. This raises a second question, then, of what is the appropriate cut point? We performed all analyses with Winsorizing of both the top 1% and 5% of observations and found significant differences across the distributions. Second, with our previous hypotheses, we demonstrate that respondents are not impacted by the crime reductions presented and instead may care more about the number of program components than the purported effectiveness of a program. It seems possible that respondents did not necessarily trust the survey stating that a certain program would actually be able to reduce these crimes by the percentage in the survey.

This begs the question of which number is the “right” cost of crime figure? Cohen’s (2015) most recent paper with a U.S. sample used a close-ended WTP payment card method and a 10% crime reduction, but the raw WTP numbers that he found are fairly similar to those elicited from this OE survey. The average WTP for financial fraud, consumer fraud, and burglary, respectively were: $39.22, $34.66, and $59.87 (Cohen 2015). Those can be compared to our respective findings of $35.06–86.87, $37.18–83.76, and $43.74–122.41 for Program A (range listed includes raw, 95%, and 99% Winsorized WTP, see Table 1). Piquero et al. (2011) used a payment card method and found an average WTP for identity theft of approximately $73 when respondents were presented with a 25% reduction. Our numbers ranged from $39.58 to $92.82 for the same reduction in identity theft prevalence for Program A. However, because Cohen (2015) presented a much smaller crime reduction percentage in the survey, the costs calculated using the formula mentioned above are much higher in that study. This occurs because the numerator of the cost of crime function becomes significantly smaller with a lower crime reduction. The costs of crime reported in that paper are $12,000 for financial fraud, $1200 for consumer fraud, and $19,000 for burglary.Footnote 16 The fact that the resulting “cost of crime” number is so drastically affected by the crime reduction presented in the survey, when our results indicate that respondents are not terribly attuned to the reduction, is concerning.

Based upon these considerations, we agree with previous scholars who hesitate with recommending CV, and in particular OE CV, to estimate costs of crime. Survey respondents provided nonsensical outliers at times and were not responsive to crime reductions or frames as expected, but were willing to pay more if a crime control strategy promised more program components. We do not believe, then (as others have also argued – see e.g. Dominguez and Raphael 2015), that true monetary value can be attached to these numbers. There may be value in ordering preferences as demonstrated by Galvin e al. (2018), but WTP estimates are likely not stable and reliable enough to be inserted into a formula for generating a “cost of crime.” Our calculations demonstrate that the formula is very susceptible to change based upon inputs, and we have also shown that the inputs cannot be viewed as true values. We instead suggest, as Aos (2015) has, that numbers such as these be used as part of the policy-making process in conjunction with other types of information. An example of this is the above-mentioned DOJ report assessing prison rape policies wherein researchers used a variety of methods to assess costs, including WTP, previous analyses of civil jury verdicts for sexual assault, and direct costs of mental health care (Department of Justice 2012).

As with all research, there are limitations to this study. First, there is little precedent in criminology for OE CV, so there is room to learn about how to best construct surveys to obtain reliable estimates. Second, as with all CV studies, the survey itself determined the crime reduction presented. Third, the present study does not take into account individual factors. This has been explored in other contexts (see Cohen 2015; Galvin et al. 2018) and there are some indications that individual differences such as gender and income might affect survey responses. However, the present sample was constructed to be representative of the overall U.S. population and the main goal of this paper was to examine the effects of survey manipulations and WTP on cost of crime figures. Moreover, policy-makers are likely looking only at the “bottom line” when making cost choices (see Rubin 2005; Obama 2017; Smith and Jensen 2017). Fourth, this study focused on three white-collar offenses and one “street” crime and results are not necessarily generalizable to other offenses such as sexual violence, child abuse, or homicide.

Moving forward, it is imperative for the field to think deeply about how to quantify crime costs. Though crime is a rare event, it affects millions of people in the United States and it is critical to understand its impacts, both financial and nonfinancial. No existing method of crime valuation is perfect and we are certainly not the only criminologists to point out issues with the CV method (see e.g., Diamond and Hausman 1995; Black et al. 2015; Manski 2015). While this is an important issue to study, our results strongly suggest that WTP numbers elicited from an OE CV survey should not be used on their own. Though these figures provide valuable context regarding taxpayer preferences, their value in determining true costs is limited due to existing challenges both in creation of the survey and in survey responses.