Introduction

In an era where the criminal justice system is largely defined by the practice of plea bargaining (see Lafler v. Cooper 2012), there has been increasing attention given to the factors that lead defendants to plead guilty. Central to these theories is the notion that plea takers receive substantial discounts in exchange for their guilty pleas. That is, a defendant who pleads guilty often receives a discounted jail or prison sentence compared to what they would have received if convicted at trial (see Alschuler 1981; Brereton and Casper 1981; Kipnis 1976). Empirical research has generally supported this claim (see Abrams 2011 for a noteworthy exception), finding that average plea sentences are significantly shorter than average trial sentences, even after controlling for a host of legal and extra-legal factors thought to be salient components of sentencing decisions (Bushway and Redlich 2012; LaFree 1985; Smith 1986; Ulmer and Bradley 2006; Ulmer et al. 2010; Yan and Bushway 2018). However, this research has also found that plea discounts tend to display significant variation across individuals, with some defendants receiving discounts approaching 100% and others receiving plea sentences that are longer than would be expected if convicted at trial (Bushway and Redlich 2012; Smith 1986; Yan and Bushway 2018). Despite the consistency of the ‘trial penalty’ effect, it is still relatively unclear how the magnitude of the plea discount is determined and what factors correspond to larger or smaller discounts.

A widely held belief is that plea discounts are proportional to the strength of the case, or in other words, the “strength of the evidence” (Brereton 1981, p. 50) against the defendant (see also LaFree 1985; Landes 1971; Smith 1986). Though logical in theory, research on the impact of evidentiary factors on plea discounts remains rare, and the research that does exist is largely inconclusive. While some studies find that inculpatory evidence either produces no effect or significantly increases the size of the plea discount (see Bushway and Redlich 2012), other studies find that evidentiary factors account for small but significant decreases in plea discounts (Redlich et al. 2016; Smith 1986). Complicating matters is the inherent difficulty involved in measuring and obtaining real-world information on case evidence, which is often missing from administrative data sets (Kutateladze et al. 2015). Accordingly, much remains unknown about the relationship between the strength of the evidence and plea discounts. If evidentiary factors strongly predict the size of the plea discount, then legal participants would be expected to make rational plea decisions based on their predicted trial outcomes (see Bibas 2004; Bushway and Redlich 2012). If not, the implication is that plea discounts may be influenced by other (possibly extra-legal) factors that may create bias against certain groups of defendants (see generally Steffensmeier et al. 1998; Ulmer et al. 2010). Indeed, there is a highly consistent body of research demonstrating that minority defendants are disadvantaged throughout the criminal justice system, including at the plea phase (see Kurlychek and Johnson 2019). Given the centrality of plea discounts to the practice of plea bargaining, understanding the factors that affect these discounts remains an important area of criminological research.

In this study, we seek to provide further insight into the relationship between evidence strength, extra-legal characteristics, and plea discounts. In doing so we build on prior efforts in several ways. First, we analyze novel data sets that provide recent estimates of plea and trial sentence lengths. Second, we use unique combinations of evidentiary variables systematically coded from observations of case summaries (prosecutorial proffers of evidence) presented during plea hearings. Third, we directly compare the effects of legal and extra-legal variables in an attempt to determine the primary drivers of plea discounts. In what follows, we first discuss the prior literature on plea discounts and the factors that may influence them. We then turn to our data, methods, results, and the implications of our study for plea bargaining research moving forward.

Plea Discounts

The sentencing differential observed between defendants convicted at trial and defendants who plead guilty has been referred to by several names, most notably the “plea discount” or “trial penalty” (Kim 2015, p. 1200; Ulmer et al. 2010; Yan and Bushway 2018). While meaningful differences exist between these two terms (see Grunwald 2020; Yan and Bushway 2018), the plea discount is most often defined as the difference between the sentence that a defendant would have received if convicted at trial and the sentence they received as part of their plea, relative to the sentence they would have received if convicted at trial (see Yan and Bushway 2018; see also Redlich et al. 2016).Footnote 1 Prior estimates have suggested that approximately 80–90% of defendants receive a plea discount (Bushway and Redlich 2012; Redlich et al., in press), and while it is not possible to directly observe the sentence that a plea bargainer would have received if convicted at trial, researchers have used several different approaches to estimate the average difference between plea and trial sentence lengths and the size of the resulting discount that plea bargainers may receive.

Criminologists have often studied the discrepancy between plea and trial sentences through the estimation of regression models that incorporate adjudication type as an independent variable (for a discussion of this approach see Bushway et al. 2014). These models seek to determine the average difference in sentence lengths comparing cases that end in trial conviction to those disposed of via guilty plea, while controlling for various legal and extra-legal factors. Generally, these efforts find significant but variable differences between plea and trial sentence lengths (Elder 1989; LaFree 1985; Ulmer and Bradley 2006; Ulmer et al. 2010). Using data on serious violent offenses in Pennsylvania, Ulmer and Bradley (2006) found that bench and jury trial convictions increased sentence lengths by 22% and 57% over guilty pleas, controlling for defendant, case, and court-level characteristics. Similarly, Ulmer et al. (2010) found that trial convictions were associated with a 45% increase in sentence lengths for federal offenders, which given a baseline average sentence of 62 months, equated to a plea discount of approximately 31%.Footnote 2

Others have attempted to estimate the size of plea discounts more directly, either through the use of regression models to predict counterfactual trial sentences for those who pled guilty (Bushway and Redlich 2012; Smith 1986; Yan and Bushway 2018; Yan 2020; see also Piehl and Bushway 2007) or by comparing observed plea sentences to the maximum statutory penalty that a defendant could have received if convicted at trial (Redlich et al., in press; Zottoli et al. 2016). In either case, researchers have consistently found support for the presence of a plea discount. For instance, both Smith (1986) and Bushway and Redlich (2012) estimated the probability of an incarceration sentence if convicted at trial for defendants who pled guilty. In both studies, the observed probability of incarceration for defendants who pled guilty was roughly 75% of the estimated probability of incarceration if convicted at trial, suggesting a discount of approximately 25% in the probability of incarceration for those who pled guilty. Using a similar approach to predict sentence lengths rather than incarceration probabilities, Yan and Bushway (2018) estimated plea discounts across 40 distinct crime types, finding an average 67% discount between predicted trial sentences and observed plea sentences (see also Yan 2020). Other studies have reported average plea discounts as high as 75–80% when considering the maximum penalty that a defendant could have received if convicted at trial (see Redlich et al. 2018; Zottoli et al. 2016), or even as high as 95% when considering active (rather than suspended) sentences alone (Redlich et al., in press). Thus, the distinction between the maximum potential trial sentence and the expected (or predicted) trial sentence has meaningful implications for the size of plea discount estimates, though the former is likely an over-estimate given that most defendants do not receive the statutory maximum penalty (see Reitz 1993; Tonry 1988).

An important codicil to the findings of these studies concerns the considerable variance in plea discount estimates that has been observed both within and across samples. While Bushway and Redlich (2012) found that the average ratio between the probability of incarceration following a guilty plea and the predicted probability of incarceration at trial was approximately 77%, this ratio ranged from a low of 2% to a high of over 200% across defendants. Yan and Bushway (2018) also found that plea discounts varied significantly across crime types, ranging from an average discount of less than 4% to an average discount of roughly 90%, with several crime types having predicted trial sentences that were less than their average plea sentences (for similar findings at the defendant-level see also Yan 2020). Likewise, Redlich et al. (2018) reported that 22% of defendants in their sample did not receive any plea discount, and that those who did received discounts ranging from 27.27% to 99.72% of their statutory maximum sentence.

What Drives the Plea Discount?

Understanding the variability in plea discount estimates is critical because the plea discount is one of the key mechanisms through which prosecutors are believed to incentivize or even “coerce” (Kipnis 1976, p. 97) defendants into pleading guilty. That is, the assurance that a guilty plea will result in a sentencing reduction when compared to the likely outcome if convicted at trial provides motivation for defendants to plead guilty. Likewise, the assurance that a plea bargain will secure a conviction provides prosecutors with an opportunity to avoid the costs and inherent risk of an acquittal at trial (see generally Alschuler 1981; Brereton and Casper 1981; Kipnis 1976).

A necessary component of this relationship, however, is that there must exist some perceived risk of a conviction or an acquittal if a case is taken to trial. If there is little to no risk of conviction, a defendant may be unwilling to plead guilty entirely, or a large discount may be required to convince them to do so. Similarly, if there is little to no risk of being acquitted at trial, a prosecutor may be either unwilling to offer a plea discount or only willing to offer a small discount relative to the expected trial sentence (see Alschuler 1968; Bibas 2004). From a strictly rational perspective, it then follows that the level of risk associated with taking a case to trial should dictate the level of incentive (i.e., the size of the discount) needed to reach a plea agreement. This level of risk has often been referred to as the “probability of conviction at trial” (Bushway and Redlich 2012, p. 438), and central to this concept is the strength of the evidence that the prosecution holds against the defendant (see Bibas 2004; Bushway et al. 2014; Landes 1971; Rhodes 1979).

Evidence Strength

Evidence strength, or “the quantity and quality of the evidence presented by the plaintiff/prosecution” (Devine et al. 2001, p. 684), is a frequently cited explanation for variations in the probability of conviction at trial and the size of the plea discount. One of the leading theories of plea bargaining, the “shadow of the trial model” (Bushway and Redlich 2012, p. 437), directly connects these concepts by positing that the strength of the evidence impacts the size of the plea discount through its effect on the perceived probability of conviction at trial (see Bibas 2004). More specifically, as evidence strength increases so too should the probability of conviction at trial, which in turn should produce plea discounts that become proportionally smaller relative to the expected trial sentence (see Bartlett and Zottoli 2021; Bushway et al. 2014; Landes 1971; Mnookin and Kornhauser 1979; Petersen et al. 2020). In support of this proposition, research has indicated that the probability of conviction at trial, as perceived by legal participants, is highly responsive to the strength of the evidence in a case (Bushway et al. 2014; Elder 1989). Thus, evidence strength has often been used as a proxy for probability of conviction in legal scholarship (see Kramer et al. 2007; McAllister and Bregman 1986).

Indeed, research suggests that evidence strength is an important predictor of numerous court outcomes. Cases with stronger evidence are significantly more likely to be prosecuted (Albonetti 1987; Frederick and Stemen 2012; McCoy et al. 2012; Schmidt and Steury 1989; Spohn and Holleran 2001) and result in guilty trial verdicts (Bushway and Redlich 2012; Eisenberg et al. 2005; Devine et al. 2009; Taylor and Hosch 2004). Prior studies also suggest that evidence strength influences the likelihood of a guilty plea/plea offer (see Elder 1989; Emmelman 1998; Henderson 2021; Kramer et al. 2007; Kutateladze et al. 2015; Luna and Redlich 2020), and even the severity of the sentence received as part of a plea offer (Elder 1989; LaFree 1985). However, limited attention has been given to the direct relationship between evidence strength and plea discounts, despite this relationship being a core component of prominent plea-bargaining theories such as the shadow of the trial.

Using data collected by Miller et al. (1980), Smith (1986) examined the impact of various pieces of evidence on plea discounts while accounting for jurisdictional, case, and defendant characteristics. Results indicated that, while the presence of physical evidence and the number of witnesses in a case significantly decreased plea discounts, the presence of eyewitness evidence led to significant increases in the size of plea discounts. Additionally, evidentiary factors appeared to be less important in predicting the size of the plea discount than factors such as jurisdiction, the defendant’s probation/parole status, and whether the defendant had a drug history or pending charges. Using a similar design, Bushway and Redlich (2012) failed to find any significant effects of physical or eyewitness evidence on plea discounts, however, they did find that the number of witnesses in a case and the presence of confession evidence significantly increased plea discounts by 1% and 12%, respectively. These results were in light of the finding that eyewitness, confession, and physical evidence all significantly increased the probability of conviction at trial in their models.Footnote 3

Other studies report findings that are more suggestive of a consistent (albeit limited) impact of evidence strength on plea discounts. Redlich et al. (2016; see also Bushway et al. 2014) experimentally manipulated combinations of confession, witness identification, and DNA evidence among a sample of legal actors given a hypothetical case scenario and asked to make plea recommendations. Findings indicated that all three pieces of evidence produced small but significant decreases in the size of the discounts that participants were willing to offer (often only between 4 and 6%). In addition, Redlich et al. (2018) coded district attorney cases files for the presence/absence of confession evidence, finding that defendants who denied their allegations received significantly larger plea discounts than those who both confessed and partially confessed to the crime, though the effect sizes for these differences were considered small by conventional standards. Relatedly, Kutateladze et al. (2015) found that evidentiary factors such as the presence of video/audio recordings and whether pre-recorded buy money or currency were recovered during an investigation significantly increased the likelihood of a custodial plea sentence offer for drug offenses. While these analyses did not directly examine plea discounts, custodial plea sentence offers are, by definition, smaller discounts than non-custodial plea sentence offers. However, after controlling for court, defendant, and other case characteristics, only the effect of recovered currency remained a significant predictor of custodial plea offers.

Interpreting the results of these studies is further complicated by inconsistent measurements of evidence strength (see generally Devine et al. 2001). Factors such as “theoretical ambiguity” and the “lack of an accepted metric” have made defining this concept difficult (see Devine et al. 2001, p. 686), which is further compounded by the lack of evidentiary variables typically found in administrative data sets (Kutateladze et al. 2015). As a result, prior efforts to measure evidence strength using real-world data have often relied on several different approaches. One approach, common in the plea discount literature, involves the analysis of individual pieces of evidence in isolation. This contrasts with the cumulative approach often used in research on jury verdicts, where individual pieces of evidence are aggregated or Likert-type scales are used to represent evidence strength more holistically (see Eisenberg et al. 2005; Garvey et al. 2004; Taylor and Hosch 2004). For example, Taylor and Hosch (2004) considered the presence of physical evidence, confession, positive eyewitness identification, weapon, number of indictment charges, degree of injury to the victim, and number of witnesses. The end measure of evidence strength was a summary variable of these factors, which was significantly related to both jury verdicts and trial sentence lengths.

While these approaches may seem at odds, it is important to consider both the aggregate amount of evidence in each case along with the individual pieces of evidence in isolation. At the individual level, different forms of evidence are likely to have varying levels of influence, and this insight may be lost when using aggregate measures of evidence alone. Indeed, research has indicated that legal participants value certain kinds of evidence over others. For instance, direct evidence, or evidence that “proves a fact without an inference or presumption” (Heller 2006, p. 248), such as confessions and eyewitness testimony, is consistently overvalued by judges and jurors (Devine et al. 2001; Heller 2006; Kassin and Neumann 1997; Niedermeier et al. 1999; Wells 1992). In contrast, circumstantial evidence, or evidence “from which the fact-finder can infer whether the facts in dispute existed or did not exist” (Heller 2006, p. 250), such as forensic/medical evidence, is often undervalued (Kassin 2012; Koehler 2001; Schklar and Diamon 1999). Even within these broad categories, however, evidence in specific forms may carry different levels of importance. Interviews with New York District Attorneys conducted by Kutateladze et al. (2015) suggested that video/audio recordings and eyewitness testimony may be particularly salient types of direct evidence, while others have suggested that the presence of a confession alone may be powerful enough to render other forms of evidence unnecessary (see Kassin 2012; McCormick 1972).

Yet, there is both a quality and quantity component to evidence strength, and individual forms of evidence do not often exist in a vacuum (see Devine et al. 2001, 2009).Footnote 4 As such, examining pieces of evidence in isolation may risk ignoring additive effects that can occur when multiple forms of evidence exist simultaneously. Prosecutors may become more confident in their chances of success at trial if they are not reliant on any single piece of evidence, knowing that they have additional evidence to fall back on. For instance, in analyzing the role of eyewitness evidence on the dispositions of 725 felony cases, Flowe et al. (2011) noted that eyewitness identification was rarely the only form of evidence involved in a case. They further suggested that the police may be reluctant to forward cases to the prosecution until multiple forms of evidence have been collected. Additionally, results from Bushway et al.’s (2014) experimental survey cited earlier suggested that combinations of evidentiary variables exerted stronger effects on probability of conviction estimates and acceptable plea sentences than any single piece of evidence alone. Thus, a deeper understanding of the effects of evidence on plea discounts may require both individual and aggregate-level assessments.

Extra-Legal Factors

Another explanation for the limited and inconsistent relationship between evidence strength and plea discounts seen in prior research is that plea discounts may simply be driven by other factors. The focal concerns theory of judicial decision-making identifies offender “blameworthiness”, “protection of the community”, and “practical constraints and consequences” (Steffensmeier et al. 1998, pp. 766–767) as the central factors considered by criminal justice actors during sentencing decisions. While determining a defendant’s blameworthiness and protecting the community involve the consideration of legal factors such as case characteristics, evidence, and prior criminal record, the focal concerns theory also asserts that these determinations require the prediction of future behavior, which is often based on inadequate or incomplete information. Given the uncertainty involved in these decisions, legal actors may rely on “perceptual shorthand” (Steffensmeier et al. 1998, p. 767; see also Albonetti 1987), or stereotypes associated with extra-legal factors such as race, sex, and age to determine an offender’s level of blameworthiness or risk of future offending.

Research has often supported the propositions of focal concerns theory as it applies to sentencing decisions. Steffensmeier et al. (1998) found the interaction of race, sex, and age to be particularly important in explaining sentencing disparity, with young Black males receiving significantly harsher sentences than other demographic sub-groups. Other studies have found that extra-legal factors impact decisions to plead guilty (Albonetti 1990; Elder 1989; Frenzel and Ball 2008; Omori and Peterson 2020) and the length and probability of carceral sentences for plea bargainers (Kutateladze et al. 2014; Johnson and Larroulet 2019; LaFree 1985; Piehl and Bushway 2007). More relevant to the present study, Ulmer and Bradley (2006) noted evidence of a significant relationship between racial composition and trial penalties, Piehl and Bushway (2007) found a correlation between race and plea value, and Kutateladze et al. (2015) reported that defendant age significantly impacted the likelihood of receiving a custodial plea sentence offer. Research has further suggested that defense attorneys may consider the demographic characteristics of their clients when making plead recommendations (Redlich et al. 2016), and that this may lead to significantly longer plea recommendations for minority clients relative to White clients (Edkins 2011).

Thus, it is possible that plea discounts are responsive to extra-legal characteristics through their effect on the way that defendants are perceived by legal actors. To date, however, we are not aware of any studies that directly compare the impact of evidence strength and extra-legal variables on the size of plea discount estimates. Such a comparison has both practical and theoretical importance. If evidence strength exerts stronger effects on the size of the plea discount than extra-legal variables, this would support rational models of plea decision-making. In other words legal participants would appear to be making plea decisions in anticipation of their expected trial outcomes (see Bibas 2004; Bushway and Redlich 2012). If non-legal factors such as a defendant’s race, age, or sex exert stronger effects on plea discounts, however, then discounting practices may be subject to the same concerns and subjective biases found in other judicial processes. Alternatively, plea discounts may be responsive to both evidence strength and extra-legal factors, in which case the analysis of interest involves determining which factors produce the strongest effect.

The Current Study

In this study, we test the effects of both evidence strength and extra-legal factors on the size of plea discount estimates. We first use administrative circuit court data from the state of Virginia for all defendants convicted of at least one felony charge via bench or jury trial during 2017–2018 to estimate predicted trial sentence lengths. We then apply the coefficients from this model to a sample of plea bargainers from one mid-sized jurisdiction in Virginia whose plea hearings were systematically observed and coded for the presence of various evidentiary items (i.e., summaries of the evidence that the prosecution would have presented at trial were observed and coded). Taking the predicted trial sentences and the observed plea sentences for this target sample, we estimate plea discounts and examine the effects of multiple forms of evidence (eyewitness, confession, forensic/medical, video/photo/audio) and extra-legal factors (race, sex, age) on the size of these discount estimates. In doing so, we expand on the approach used by Smith (1986; see also Piehl and Bushway 2007; Bushway and Redlich 2012; Yan and Bushway 2018) by using a penalized ridge regression to account for the effects of multicollinearity between terms and the potential for model overfitting.

Based on prior literature and theory, we hypothesized that evidence strength would be significantly and inversely related to plea discounts, such that stronger evidence would lead to smaller plea discounts. More specifically, we hypothesized that direct evidence (e.g., witness and confession evidence) would have a stronger effect on plea discounts than circumstantial evidence (e.g., forensic/medical evidence), and that increases in the number of different forms of evidence would reduce plea discounts by the largest amount. In line with focal concerns theory, we also hypothesized that defendants who are young, male, and/or belong to a minority group would receive significantly smaller discounts than defendants of other demographic backgrounds. However, we anticipated that evidentiary factors would produce the largest effects on plea discounts, given that evidence strength should directly relate to case convictability (Bibas 2004; Spohn 2018).

Methods

Data Sources and Collection

Two primary data sets were used in this study. The first data set included all hearings that took place in Virginia Circuit Courts during the years of 2017–2018.Footnote 5 Circuit courts in Virginia have jurisdiction over all felony prosecutions, any misdemeanor charges that accompany felony prosecutions, and any direct indictments. Processing information for these courts is made publicly available through Virginia’s judicial system website.Footnote 6 These data contain information on hearing dates, charge type, jurisdiction, disposition, sentencing, and certain defendant characteristics. To generate models that would explain variation in trial sentence lengths, we first isolated all cases concluded by bench or jury trials that contained at least one felony conviction (roughly 71% of all trials during this timeframe). This resulted in a valid sample size of 5,671 cases, which represented our population-level trial conviction data.Footnote 7

The second data set was our plea sample, which was obtained from systematic observations of plea hearings that took place between February 2017 and August 2018 in one mid-sized circuit court in Virginia (see Dezember et al. 2022). The data collection site was a suburban county of approximately 450,000 people that processes around 7,500 adult arrests yearly (Federal Bureau of Investigation 2016). A team of six researchers attended and coded plea hearings during this time, providing access to defendant, case, and sentencing information for this sample. Most importantly, the research team observed case summaries of the evidence that the prosecution proffered they would have presented at trial, allowing researchers to code for various pieces of direct and circumstantial evidence. Interrater reliability for these observations was assessed using a sample of 206 hearings observed by multiple coders. The Krippendorf’s alpha value for these hearings was 0.90, which indicates very strong agreement between raters (Hayes and Krippendorf 2007).

As a component of the plea observations, researchers also examined official case records, which provided information on the charges that each defendant was indicted on and those that they pled to. This is particularly important as the indicted charges allow us to better account for any potential charge bargaining that may have occurred during case processing (see Piehl and Bushway 2007; Yan and Bushway 2018). More specifically, in exchange for a plea of guilty, defendants often benefit from a reduction in the nature or severity of their charges that would not have otherwise occurred. As such, the charges to which a defendant pleads guilty may represent only a subset of those that they would have faced at trial. By basing predictions on the indicted charges rather than the plea charges, we are able to better capture the full range of charges that the defendant would have likely at faced trial.Footnote 8

Thus, the inclusion of evidentiary variables and information on the charges that each defendant faced at indictment make this data set uniquely suitable for the current study. In total, 611 plea hearings were fully observed and coded. However, here we use only those cases where the defendant was indicted on at least one felony charge, where the judge accepted the defendant’s plea, and where an incarceration sentence was imposed, resulting in a valid sample size of 535 cases (for more information on the data collection procedures see Dezember et al. 2022).

Dependent Variables

To estimate sentencing discounts for our plea-bargaining sample we first needed trial sentence predictions. Thus, our initial dependent variable was the carceral sentence length received by defendants who were convicted at trial. Prior studies have often used the top charge (e.g., most severe charge) associated with each case as the unit of analysis (see Johnson and Larroule 2019; Yan and Bushway 2018), however, our intended unit of analysis for this study is at the case-level, where a case represents all charges that a defendant faced in a single trial. As such, our sentence length measure is operationalized as the aggregate sentence received across all charges within a case, including both felonies and misdemeanors. We do this based on the possibility that individual charges nested within a case may not be independent, but rather judges may consider the characteristics of the entire case when making sentencing decisions. Additionally, it is not uncommon for felony charges to be accompanied by secondary felonies or misdemeanors that can add substantial variability to the total sentence received. Our aggregated sentence lengths were measured in months and were top coded at 600 (50 years) to avoid excessive skew.Footnote 9 Finally, after estimating these trial sentences, our primary dependent variable became the plea discount, defined here as the proportional difference between the observed plea sentence and the predicted trial sentence (see section "Plea Discount Estimation" below).

Independent Variables

A primary goal of this study was to generate accurate trial sentence predictions rather than make any direct inferences regarding the effects of individual factors on sentencing outcomes. As such, we used a wide array of independent variables related to case, defendant, and external characteristics to predict trial sentences. Case characteristics included the number of charges that the defendant was convicted on, the types of charges that the defendant was convicted on (e.g., murder, rape, robbery, etc.), the minimum and maximum potential sentences that the defendant could have received across all charges, the statutory class of the charges that the defendant was facing (e.g., class 1 felony, class 2 felony, misdemeanor, etc.), and whether any of the charges that the defendant faced included a mandatory minimum sentence. Given that each case can contain multiple crime types and charge classifications, these characteristics were added as dummy variables (0/1) to allow for multiple categories to exist simultaneously for a single case. Class 1 felonies were extremely rare in our trial conviction data (n = 6, 0.1%), and so these charges were merged with Class 2 felonies. Additionally, we added squared terms for both the minimum and maximum potential trial sentences that the defendant could have received across all charges to account for any potential curvilinear relationships.

Defendant characteristics included multiple measures of criminal history. While we did not originally have this information, we used the same publicly available Virginia Circuit Court data for the years 2000–2016 to calculate the number of felony convictions, misdemeanor convictions, and prior active incarceration sentences for each defendant. We then merged this data into our trial conviction sample to match criminal history with each defendant.

Considering our interest in the relationship between plea discounts and extra-legal variables, we omit demographic information from our prediction models for both theoretical and practical reasons. First, our goal in this study was to generate the most valid counterfactual trial sentences possible without the inclusion of demographic and evidentiary variables. In theory, trial sentence lengths should not be based on the strength of evidence or demographic characteristics (see Bushway & Redlich 2012), meaning that cases and defendants who are legally similar should receive similar sentences. The degree to which the magnitude of the differences between our observed plea outcomes and our predicted trial outcomes relates to evidentiary and demographic variables is the substantive interest of our analyses. Thus, omitting these factors from our prediction model is what allows us to introduce them during our plea discount analysis. The inclusion of evidence and demographic information in the trial sentence predictions would inevitably include these factors in the plea discount measure, potentially creating artificial correlations between our variables of interest.Footnote 10 However, as a robustness check, we did estimate trial sentences with the inclusion of demographic variables, and their omission had little effect on our trial sentence predictions (see Results section).

Given potential differences in regional and cultural sentencing practices across the state (see Kramer & Ulmer 2009), we also included the jurisdiction/area where the trial took place. At the county or city level there were several jurisdictions with no trial convictions or very small numbers of trial convictions. To account for this, we combined jurisdictions into their natural regions of the state (North, Central, West, East, and Southwest) and included these regions as categorical independent variables in our prediction model, with the North region (region from which our plea sample was taken) designated as the reference category. This addition is important because, despite our plea sample being drawn from only one region in the state, our trial sentence predictions are adjusted for variation in sentencing practices unique to this region. Lastly, while both of our data sets contained hearings that took place during 2017–2018, approximately 6% (n = 365) of eligible trial convictions carried over into 2019. As such, we included disposition year as a categorical independent variable given the potential for changing sentencing practices and statutes over time (see Porter 2020).

In explaining our estimated plea discounts, we first employed a variety of different evidentiary measures. These included witness, forensic/medical, video/audio/photo, and confession evidence. Witness evidence involved any situation in which a witness was prepared to testify to observing the alleged offense or some element material to the offense. Forensic/medical evidence included any type of medical or forensic information that would have been introduced, such as fingerprints, DNA, blood analyses, or accompanying expert testimony. Video/photo/audio evidence involved any form of media that tended to directly implicate the defendant, and confession evidence involved direct, incriminating admissions given by the defendant. Individual pieces of evidence were introduced as dummy variables (0/1). We also created a summary measure of evidence by calculating the number of different pieces of evidence contained in a case. Finally, we introduced extra-legal factors such as race/ethnicity, sex, and age (and interactions between these characteristics) to test their comparative effects with those of evidentiary factors.

Plea Discount Estimation

Plea discount estimation followed the general approach first proposed by Smith (1986). Specifically, we generated counterfactual trial sentences using a regression model that considered the case, defendant, and external characteristics described above. Here, however, we used a penalized ridge regression (see Hoerl and Kennard 1970) rather than a standard ordinary least squares regression (OLS). While the coefficients estimated by an OLS regression may be unbiased in the absence of multicollinearity, strong correlations between independent variables can lead to coefficients that are artificially extreme in either direction (Hoerl and Kennard 1970; Yoo et al. 2014). Examination of variance inflation factors (see Marquardt and Snee 1975; Miles 2005) following the estimation of an OLS regression for our data indicated multicollinearity across several independent variables, however, these variables still added significant predictive ability to our model.Footnote 11 One approach to deal with this issue would involve variable deletion (McDonald 2009; Tabachnick and Fidell 2007), however, this approach would reduce the amount of information contained in the model and would lead to less accurate predictions.

Ridge regression attempts to solve these issues by applying a penalty parameter to the regression coefficients, shrinking them toward 0 to mitigate the effects of multicollinearity while still allowing them to remain in the model (see Fox 2016; McDonald 2009).Footnote 12 This, in turn, introduces bias into the regression coefficients in exchange for a decrease in variance. That is, this shrinkage has the effect of smoothing the regression coefficients, which may reduce the variance that the model explains on the data from which it was generated but serves to increase the generalizability of the model to other data sets (see Agresti 2019; Melkumova and Shatskikh 2017). A key distinction is that, while the standard OLS regression seeks to minimize the sum of the squared errors between predictions and observations (see Tabachnick and Fidell 2007), the ridge regression attempts to minimize the sum of the squared errors plus the additional penalty parameter, as depicted in Eq. 1 (see Tibshirani 1996):

$$\mathop \sum \limits_{i = 1}^{N} \left( {y_{i} - \mathop \sum \limits_{j} \beta_{j} x_{ij} } \right) + \lambda \mathop \sum \limits_{j} \beta_{j}^{2}$$
(1)

Here, \({y}_{i}\) represents the observed trial sentence for a given defendant, \(\mathop \sum \limits_{j} \beta_{j} x_{ij}\) represents the predicted trial sentence for that defendant, and \(\lambda \mathop \sum \limits_{j} \beta_{j}^{2}\) represents the penalty parameter applied to the regression coefficients. The predicted trial sentences themselves resemble a standard OLS formula as seen in Eq. 2:

$$\mathop \sum \limits_{j} \beta_{j} x_{ij} = \beta_{0} + \beta_{1} {\text{Offense}}_{ij} + \beta_{2} {\text{Criminal History}}_{ij} + \beta_{3} {\text{External}}_{ij}$$
(2)

where \(\beta_{1}\) represents a vector of regression coefficients related to the characteristics of the offense, \({\beta }_{2}\) represents a vector of coefficients related to the defendant’s criminal history, and \({\beta }_{3}\) represents a vector of coefficients related to external characteristics (e.g., jurisdiction and year of the trial). All independent variables here are standardized prior to model estimation and are then converted back to their original scale prior to reporting the results (see Hastie et al. 2016). We chose not to report standardized coefficients as the majority of our independent variables are binary in nature. Additionally, we did not have a substantive interest in comparing the strength of the coefficients across differing scales, particularly given that our coefficients are penalized.

A critical component of the ridge regression is the tuning parameter (\(\lambda\) in Eq. 1). This parameter determines how much shrinkage will be applied to the coefficients, with more shrinkage leading to increased bias and decreased variance (and vice versa; see Melkumova and Shatskikh 2017). To select an appropriate value for the \(\lambda\) parameter, we first randomly divided our trial conviction data into a training data set and a test data set using a 70%—30% split. Here, the training and test data sets were statistically independent of each other and neither data set contained cases from the target sample (i.e., plea sample). After dividing the trial conviction data in this way, we conducted a tenfold cross-validation on the training data set to identify the \(\lambda\) value that produced the minimum mean squared error (MSE).Footnote 13 However, we used the “one standard error rule” (Ternès et al. 2016, p. 2564) to select the largest \(\lambda\) that still produced a MSE value that was within one standard error of the minimum MSE across all models (see also Hastie et al. 2016). This approach is intended to provide a compromise between the amount of penalization applied to the coefficients and the overall error of the estimated model (see Melkumova et al. 2017).

Using our selected \(\lambda\) value, we calculated our regression coefficients using the training data set. While some have recommended the use of a natural log transformation when modeling sentence lengths (see Freiburger et al. 2019), we found that our untransformed sentence lengths led to substantially higher predictive accuracy, and thus we report results from the prediction model using the untransformed dependent variable.Footnote 14 As recommended (Hastie et al. 2016; Tibshirani 1996), we then applied the coefficients from this model to the test data set to evaluate the accuracy of the predictions before ultimately applying the model to our plea sample. When applying these coefficients to the plea sample, we used the charges at indictment to account for any potential charge bargaining that may have occurred between indictment and plea. Finally, after generating predicted trial sentences for our plea sample, we calculated the plea discount using Eq. 3, where \(\widehat{TS}\) represents the predicted trial sentence and \(PS\) represents the observed plea sentence:

$${\text{Plea}}\;{\text{Discount}} = \frac{{\widehat{TS} - PS}}{{\widehat{TS}}}*100$$
(3)

It is important to note that, while each individual in our sample received a trial sentence prediction, our results and inferences are concerned with marginal effects rather than individual estimates. That is, our predictions are based on the average effect of group membership across our collection of independent variables. This approach is consistent with those of prior efforts to examine the impact of evidence on plea discounts and the estimated probability of conviction at trial (see Bushway & Redlich 2012; Smith 1986; Yan & Bushway 2018).

All analyses were conducted in R statistical software (R Core Team 2020). Data partitioning was conducted using the rsample package (Silge et al. 2021) while cross-validation and ridge regression models were estimated using the glmnet package (Friedman et al. 2010). The R code is available from the authors upon request.

Results

Descriptive statistics for the full trial conviction and plea samples can be seen in Table 1. On average, defendants convicted at trial received an additional 34 months of incarceration compared to defendants who pled guilty, however, there were significant differences in the characteristics of plea and trial cases. Trial conviction cases were more likely to include murder, rape, and assault charges than plea cases (20.7% vs. 13.1%, χ2(1) = 17.15, p < 0.001), while plea cases were more likely to include drug charges and property crime (78.7% vs. 58.5%, χ2(1) = 82.45, p < 0.001). Plea cases were also more likely to include misdemeanors and class 6 felonies than trial cases (69.4% vs. 54.2%, χ2(1) = 45.12, p < 0.001), and defendants convicted at trial had more extensive criminal records, with over 2.5 prior felony and misdemeanor convictions compared to just over 1 prior conviction for defendants who pled guilty (t(1281.1) = 11.77, p < 0.001). However, defendants who pled guilty faced more charges per case (t(6204) = 9.36, p < 0.001) and higher maximum potential sentences if convicted at trial (t(6204) = 6.93, p < 0.001). Demographic information is not displayed in Table 1 as it was not used as a component of our trial sentence predictions (demographic information for the plea sample can be seen in Table 3). However, the trial conviction sample was predominately Black (49.1%) and male (79.2%); age could not be calculated as date of birth was not provided in these data. Lastly, certain jurisdictions and trial years were not applicable to our plea sample. The coefficients from these empty factor levels simply dropped out of our trial sentence predictions.

Table 1 Descriptive statistics for trial sentence predictors

As previously discussed, we first estimated our trial prediction model on a training data set comprised of a random sample of 70% (n = 3,967) of all trial conviction cases. We then tested the accuracy of our predictions on a test data set, which was the remaining 30% (n = 1,704) of trial conviction cases. Results from these models can be seen in Table 2. Of note here, the addition of the penalty parameter in the ridge regression intentionally introduces bias, and in doing so, the standard errors of the regression coefficients lose meaning. As such, inferences regarding statistical significance are often not recommended for ridge coefficients (see Fox 2016; Goeman et al. 2018). Furthermore, statistical significance is not meaningful for our purposes given that we have no specific hypotheses or interest in theory-building related to the trial sentence predictions. Thus, we report only the coefficients themselves and the predictive accuracy of this model.

Table 2 Ridge regression coefficients for trial sentence predictions (n = 3,967)

The coefficients in Table 2 generated predictions that explained over 60% of the total variance in the trial sentence lengths for both the training and test data sets, indicating a predictive ability comparable to that of prior studies (see Yan and Bushway 2018; Yan 2020). Additionally, the consistency across both training and test data sets suggests that the model is generalizable across samples. In general, defendants facing a larger number of charges and more severe charges received longer trial sentences. Murder and rape charges increased trial sentence predictions by over 65 and 82 months, respectively, while class 1 and class 2 felonies increased trial sentences by over 42 months. Prior convictions and incarceration sentences also led to increases in sentence lengths, though criminal history appeared to have a smaller effect than case characteristics (e.g., number of charges per case). There was also notable variation across jurisdictions, with cases in the Western region of the state receiving an average 6-month reduction in sentencing, compared to cases in the Northern region. While the model reported in Table 2 does not account for demographic characteristics, we added race and sex as a robustness check. The R2 values for both the training and test data sets increased by less than 0.2%, and the predicted values with and without these demographic factors were highly correlated on both the training and test data sets (r = 0.999 for both). Thus, it does not appear as though the omission of demographic characteristics significantly impacted the accuracy of our trial sentence predictions.

Taking the coefficients from Table 2, we then estimated the counterfactual trial sentences and resulting plea discounts for our target sample using the charges at indictment (see Fig. 1). For the full sample, the observed plea sentence was 64.2% of the predicted trial sentence, suggesting an average discount of approximately 36%. This mirrors the results at the individual level as well, where the mean discount across defendants was 36.5% (Table 3). However, like prior studies we find a large degree of variance in the size of the plea discount across defendants, with discounts ranging from a 99% reduction in the plea sentence relative to the expected trial sentence to a 165% increase in the plea sentence relative to the expected trial sentence. This suggests that some defendants are receiving considerably longer plea sentences than would be expected at trial. In fact, we find that nearly 18% of defendants received a plea sentence that was higher than their expected trial sentence (for a similar estimate see Bushway and Redlich 2012).

Fig. 1
figure 1

Plea discount distribution

Table 3 Plea discount and predictor variables for plea sample (N = 535)

Given the presence of significant variation in these discount estimates (Fig. 1), our primary interest becomes exploring the sources of this variation. Table 3 reports the distribution of the evidentiary and extra-legal variables used as predictors of the plea discount. On average, these cases contained around two distinct forms of evidence, with the vast majority containing some form of witness evidence. Confession evidence was also common, occurring in nearly 50% of all cases, while both video/photo/audio evidence and forensic/medical evidence occurred in less than a quarter of all cases. Defendants in this sample were predominately male and in their early 30s. Black was the modal racial category, followed by White and Hispanic.

Table 4 reports the results from a series of OLS regressions examining the impact of evidence on the size of the estimated plea discount. In Model 1, we regress the discount on each individual form of evidence. The intercept for this model indicates that defendants who had none of the measured forms of evidence against them received nearly a 50% plea discount. However, we find that only the presence of direct video/photo/audio evidence significantly impacts this discount, leading to a 9.3% decrease in the size of the discount when controlling for other evidentiary factors. The presence of witness evidence also reduced our estimated plea discounts by approximately 13%, however, this effect was only present at a less stringent 0.10 significance threshold.

Table 4 OLS regression results for plea discount

In Model 2 we introduce a continuous measurement of the number of different forms of evidence contained in each case. This variable could not be added to Model 1 as it is simply an additive product of each individual form of evidence, and thus is perfectly collinear with these measures. However, we do introduce this variable alongside video/photo/audio evidence, given that this was the only individually significant form of evidence found in Model 1. As such, Model 2 tests whether the number of pieces of evidence impact plea discounts above and beyond the effect of video/photo/audio evidence alone. Results indicate that each additional form of evidence leads to a statistically significant 7% decrease in the size of the plea discount. The presence of video/photo/audio evidence is no longer significant in this model, suggesting that the cumulative effect of evidence may be a more important determinant of the plea discount than any single piece of evidence in isolation.Footnote 15

Model 3 introduces extra-legal demographic variables. The number of pieces of evidence in each case continues to demonstrate a significant relationship with plea discounts. However, even after controlling for this effect, Hispanic defendants and male defendants received significantly smaller plea discounts than other demographic groups. Specifically, Hispanic defendants’ average plea discount is 20% smaller than White defendants, and male defendants’ average plea discount is 12% smaller than female defendants.

Given these findings and the recommendations of focal concerns theorists to examine specific demographic subgroupings (see Steffensmeier et al. 1998), we added an interaction term for race/ethnicity and sex to Model 4. The addition of the interaction term necessitates that the main effects for race/ethnicity and sex be ignored in this model, and our results indicated a significant interaction between the two, with Hispanic males receiving an additional 37% decrease in the size of their plea discounts beyond the effect of being male or Hispanic alone. Using the intercept of Model 4 as a reference point, White males of average age and with an average amount of evidence against them received a mean plea discount of roughly 36.6%. In comparison, Black males received a mean discount of 32.8% and Hispanic males received a mean discount of only 11%. The effect of evidence also remained significant in Model 4, with a one-unit increase in the amount of evidence leading to an average 6% decrease in plea discounts, though no other main or interaction effects were found across our other independent variables.

Model 5 replicates Model 4, but controls for each individual piece of evidence rather than the total number of pieces of evidence in each case. Results are substantively similar, indicating a significant interaction between being Hispanic and male, however, no individual piece of evidence achieves statistical significance in this model. Lastly, we tested interaction terms between race/ethnicity and the number of pieces of evidence, as well as between race/ethnicity, sex, and age (not shown). Results for both tests were nonsignificant.Footnote 16

The relationship between evidence, race, sex, and plea discount can be further seen in Fig. 2. For both males and females of all racial categories, increases in the number of pieces of evidence lead to decreasing plea discounts. However, plea discounts also appear to be smaller for males than for females across all evidentiary and racial categories. Additionally, there is a clear and significant difference between the average predicted discount for Hispanic males and females. While Hispanic males receive the smallest plea discount across all levels of evidence, Hispanic females appear to receive the largest plea discount across all levels evidence.Footnote 17 Hispanic males with 3–4 pieces of evidence against them were predicted to receive plea discounts of approximately 0%. In comparison, no other combinations of race/ethnicity and sex were predicted to receive plea discounts less than approximately 10%, regardless of the number of pieces of evidence against them.

Fig. 2
figure 2

Predicted discount by evidence, race, and sex

Thus, our findings suggest significant impacts of both evidence and extra-legal factors on plea discounts, however, this does not indicate which factors are producing the largest effect. To compare the magnitude of these effects, we calculated standardized effect sizes for the number of pieces of evidence, the main effects of race/ethnicity and sex, and the interaction between race/ethnicity and sex. Given that both the number of pieces of evidence and the plea discount are continuous measures, correlation coefficients were used as measures of effect size to allow for comparison of evidence with race/ethnicity and sex (see Lipsey and Wilson 2001). Results indicated that the being Hispanic and male produced the largest effect (r = − 0.271, 95% CI [− 0.400, − 143]), followed by the main effect of being Hispanic alone (r =  − 0.228, 95% CI [− 0.344, − 0.113]), the number of pieces of evidence (r = − 0.130, 95% CI [− 0.218, − 0.042]), and finally the main effect of being male alone (r = − 0.127, 95% CI [− 0.215, − 0.040]).Footnote 18

Sensitivity Analyses

We conducted several sensitivity analyses on our main findings (see Supplementary Materials). First, given the small sample sizes available for our interaction effects (e.g., 12 Hispanic female defendants), we re-estimated these models using a log transformation for the dependent variable. This required us to flip the distribution such that all values were positive, and to do so we simply divided the plea sentence by the expected trial sentence, which produced a distribution that was perfectly and negatively correlated with our discount measure. When this measure was log transformed, results were substantively similar to those of Model 4, indicating a statistically significant interaction effect between race/ethnicity and sex (b = 0.66, p = 0.02). The coefficient for this model is positive because the outcome distribution has been flipped, but the results still correspond to a significant decrease in the average plea discount for Hispanic male defendants.

Second, we re-introduced variables related to criminal history and charge severity into our main plea discount models to determine whether these factors may have confounded our observed effects. Our results remained unchanged, with significant main effects for the number of pieces of evidence, race/ethnicity, sex, and the interaction between race/ethnicity and sex. We urge caution in interpreting these results, however, as the variables related to criminal history and charge severity in these models were inevitably included in both the dependent and independent variables. Nonetheless, these results provide some indication that our findings may be robust to the inclusion of multiple theoretically salient variables.

Finally, because modeling all charges contained within each case produces numerous combinations of charges which may be nonequally constructed across plea and trial cases, we repeated our analyses using only a top charge per case. Here, we defined the top charge as the charge with the longest minimum potential sentence. We chose the minimum potential sentence as there is no clear hierarchy of severity by crime type or charge class in Virginia (e.g., assault charges can range from misdemeanors to class 2 felonies, class U felonies can be both more and less severe than other classes of felonies, etc.). Limiting our models to a single charge resulted in a loss of explanatory power for both training (R2 = 0.49, RMSE = 73.5) and test (R2 = 0.49, RMSE = 70.00) data sets, and the average plea discount decreased to approximately 14% (down from 36.5%). However, under this specification there remained a significant interaction between race/ethnicity and sex (b = -0.42, p = 0.048). The effect of evidence, while similar in magnitude (b = -0.046), was no longer statistically significant. There was also a notable outlier in this distribution, with one defendant receiving a plea sentence nearly five time larger than their predicted trial sentence. When omitting this outlier, the number of pieces of evidence per case was a significant predictor of plea discount estimates (b = − 0.07, p = 0.03), as was being Hispanic (b = − 0.19, p = 0.01) and male (b = − 0.17, p = 0.01), but the interaction between race/ethnicity and sex was only marginally significant (b = − 0.35, p = 0.08). Thus, this alternative specification produced substantively similar results to that of our main approach, leading to similar overall conclusions.

Discussion

Rational theories of plea decision-making, such as the shadow of the trial model, posit that evidence strength and the resulting probability of conviction at trial should directly determine the size of the discount necessary to induce a guilty plea (Bibas 2004; Bushway and Redlich 2012). However, research on this effect has been limited and inconsistent (Bushway and Redlich 2012; Smith 1986), calling into question the validity of this theory. If evidence strength is not a primary driver of the plea discount, then it remains possible that, like other sentencing outcomes, plea discounts are influenced by focal concerns related to factors such as race, sex, and age (see Ulmer and Bradley 2010; see also Steffensmeier et al. 1998). The current study explored these issues by directly testing the impact of both evidence strength and extra-legal factors on plea discount estimates. Our results suggest that both factors impact the size of plea discounts, but that demographic characteristics, and particularly the interaction between these characteristics, may produce the largest effects. These findings have important theoretical and policy implications and raise questions about rational theories of plea decision-making.

Compared to prior efforts of this kind (see Bushway and Redlich 2012; Yan and Bushway 2018), our results are more suggestive of a relationship between evidence strength and plea discount magnitude. We find that increases in the amount of evidence associated with a case are associated with significant decreases in the size of the estimated plea discount. However, we also find that this effect appears to be more about evidence quantity than any specific form of evidence in isolation. Prior research has suggested that direct evidence such as confessions or eyewitness testimony is often overvalued by legal decision-makers (Devine et al. 2001; Heller 2006; Kassin and Neumann 1997; Niedermeier et al. 1999; Wells 1992). Given these findings, we hypothesized that our measures of direct evidence (confession, witness, and video/photo/audio) would significantly reduce plea discount magnitude. This hypothesis was only partially supported, however, as the presence of video/photo/audio evidence was the only individual form of evidence to significantly impact plea discounts. Additionally, when controlling for the number of pieces of evidence associated with each case, the influence of video/photo/audio evidence was no longer significant. Thus, whereas direct evidence, such as confession and eyewitness testimony, has been found to strongly impact trial verdicts (e.g., Devine et al. 2009; Kassin 2012) and to a lesser degree, plea discounts in a hypothetical case (Bushway et al. 2014; Redlich et al. 2016), we did not find such relationships here.

We also hypothesized that the total amount of evidence in each case would exert a larger impact on plea discounts than any individual piece of evidence alone. That is, the totality of the evidence facing a defendant may be a better proxy for evidence strength than any single form of evidence alone. Moreover, multiple forms of evidence may increase legal actors’ confidence in a case, providing additional room for error should any individual piece of evidence fall through. Our results supported this hypothesis, as the number of different forms of evidence contained in each case remained the only evidentiary variable to demonstrate significant effects across all model specifications. However, we still expected to see significant variation in the impact of specific forms of evidence, and it is unclear why this was not the case. It is possible, perhaps, that salient individual pieces of evidence (i.e., direct evidence) are more important considerations at trial than they are during plea negotiations. If so, this would stand against the notion that plea decisions are made in the shadow of the trial.

Perhaps our most important finding concerns the discrepancy in the size of the observed effects for evidentiary variables and demographic characteristics. Bushway and Redlich (2012) suggested that “if plea arrangements are not conducted in the “shadow of a trial”, we would expect to see factors other than those related to evidence to be influential” (p. 443). Specifically, focal concerns theory suggests that extra-legal factors such as race, sex, and age may impact plea discounts through their effect on the way that defendants are perceived (i.e., young minority males may be considered more dangerous and blameworthy, and thus less deserving of a plea discount; Ulmer et al. 2010; see also Kutateladze et al. 2015; Steffensmeier et al. 1998). It is possible that legal actors in our sample resorted to “perceptual shorthand” (Steffensmeier et al. 1998, p. 767) when presented with incomplete information about a defendant’s level of dangerousness or blameworthiness, and that this may have led to improper considerations of race/ethnicity and sex during plea and sentencing decisions. Our findings provide preliminary support for these claims, indicating that Hispanics, males, and particularly the interaction between the two, were associated with significantly smaller plea discounts (i.e., less lenient plea offers) than White and/or female defendants. Moreover, this effect remained significant even after controlling for the amount of evidence in each case.

It is not clear why this effect was so concentrated within Hispanic defendants as opposed to other minorities. While all racial categories were associated with smaller plea discount estimates than White defendants, only the effect of being Hispanic was statistically significant. It is worth noting that these data come from a jurisdiction where about one-quarter of the population is Hispanic, and it remains possible that there is a unique locational element responsible for this discrepancy. However, as of 2018 the Hispanic population across Virginia jurisdictions ranged from 0.7% to 40.1% (University of Virginia 2018). Thus, while our target jurisdiction has a larger Hispanic population than many others, it is not an anomaly within the larger context of the jurisdictions included in our prediction model.

Recent work has also found strong relationships between disparate court outcomes and ethnic distinctions (Omori and Petersen 2020; Smith et al. 2021). In Miami-Dade County (FL), Omori and Petersen (2020) reported that Black Latinos were disproportionately given pre-trial detention, convicted, and sentenced to incarceration, and that this disparity was greatest when comparing White non-Latinos to Black Latinos. Omori and Petersen argue for the importance of examining both racial and ethnic distinctions in research on court outcomes, and while we are not able to separate our demographic characteristics in this way, it is possible that our results are a combination of both racial and ethnic effects. An alternative explanation is that Hispanic defendants are more likely to lack U.S. citizenship, and that citizenship status exerts a strong impact on sentencing disparities (see Light 2014; Light et al. 2014). Indeed, a 2010 Supreme Court decision, Padilla v. Kentucky, mandated that non-US citizens must be made aware of the risk of deportation when pleading guilty. This would be a plausible explanation for our findings, but one that we could not directly test.

The larger question to be asked concerns the implications of our findings for plea bargaining theory and criminal justice inequality. On a cursory level, our study may indicate that there is merit to both rational theories of plea decision-making (i.e., the “shadow of the trial” model) and theories that consider subjectivity and potential bias in these processes (i.e., focal concerns). In either case, our results suggest that plea discounts are not solely (or even primarily) related to the convictability of a case. Instead, our findings paint a more complicated picture, stressing the importance of the way that defendants are perceived by legal actors. If these perceptions are based on characteristics that should have no legal bearing, then plea discounts are not simply the product of legal considerations.

However, the concept of convictability may also involve more than just evidence strength or legal considerations alone. For instance, in an ethnographic study conducted from within a district attorney’s office, Frohmann (1997) found that prosecutors often considered non-legal characteristics such as race and sex when determining the convictability of a case. That is, prosecutors envisioned the way that each case might be perceived by a hypothetical jury when deciding whether to pursue charges, and these projections accounted for extra-legal characteristics of the victim and defendant. Thus, whereas extra-legal factors in theory should not influence convictions (either by trial or plea), they nonetheless do, a fact that most experienced prosecutors are likely aware. In this way, it may be that evidence, race, and sex all become components of convictability assessments and the perceived strength of the case. Prosecutors in our sample may have been more confident that they could secure a trial conviction against defendants who were Hispanic and male, net of any significant differences in evidence strength. If so, these decisions would undoubtedly be based on stereotypes, and could result in systematic biases against certain groups of defendants, but the proposition that plea discounts relate to case convictability might still hold true. It may simply be that the strength of the case, as perceived by legal actors, involves much more than objective evidence alone.

Limitations and Conclusions

Several important limitations of this study deserve mention. First, as with any effort to generate predictions for an outcome that did not occur in reality, our approach inevitably assumes that a valid trial sentence prediction can be derived for defendants who plead guilty (for similar discussions see Bushway and Redlich 2012; Bushway et al. 2014; Yan and Bushway 2018). While we took significant measures to control for an array of defendant, case, and external characteristics, our results did suggest significant differences in case characteristics between defendants who went to trial and those that pled guilty. Additionally, our prediction model did not include several variables that may be important determinants of trial sentence length, such as the type of defense counsel, the relationship between the offender and victim, and whether the defendant was detained prior to trial (see LaFree 1985; Smith 1986). While these limitations may be unavoidable using a counterfactual approach, we note that our model still performed very well across both training and test data sets. Thus, to the degree that this assumption is valid, our model appeared well equipped to generate true-to-life trial sentence predictions.

Our trial sentence predictions were also generated using statewide data, but then applied to only one jurisdiction within the state. This presents concern over the potential that specific factors related to sentencing decisions in our target jurisdiction were improperly accounted for. While we added fixed effects to our model to adjust for sentencing differences between geographic regions of the state, we were unable to add terms specific to each individual jurisdiction. Additionally, this approach assumes that the coefficients for our independent variables are constant across geographic regions (i.e., that there are no interactions between geographic region/jurisdiction and other predictor variables). Nonetheless, the use of statewide data was necessary to provide an adequate number of cases to accurately generate trial sentence predictions and cross-validate those predictions. The use of fixed effects to account for variance attributable to macro-level geographic units is also similar to prior efforts of this kind (Bushway & Redlich 2012; Smith 1986), and our geographic regions largely fall along political and cultural lines that may be particularly salient for sentencing severity (see Ulmer et al. 2008).

Second, our measures of evidence were limited to those that were proffered by prosecutors during plea hearings. While this approach is unique, it is possible that we were lacking important evidentiary variables, or that there was additional evidence that was not proffered. On a related note, we did not have several evidentiary measures that have been common in prior studies, such as the number of witnesses in each case (Bushway and Redlich 2012; Smith 1986), or whether a weapon was used or recovered (LaFree 1985; Kutateladze et al. 2015). Measuring the number of pieces of evidence in additive fashion may also be overly simplistic. No differential weights were given to specific pieces of evidence, and we were unable to account for evidence quality. Nonetheless, coding evidence as proffered during plea hearings is a unique approach to this research that allowed us to create novel evidentiary variables, such as video/audio/photo evidence and forensic or medical evidence.

Another important limitation is the potential for unobserved characteristics to confound the effects of race and sex on plea discount magnitude (see Sensitivity analyses section). Given that elements related to a defendant’s criminal history and current case characteristics were used to generate trial sentence predictions, they could not be confidently added as predictors of plea discount estimates. This creates the possibility that Hispanic defendants and/or males were also associated with longer criminal histories, more severe charges, or other characteristics that could have impacted discount estimates. While we explored this potential in our sensitivity analyses and did not find any evidence to suggest that criminal history or charge severity confounded our findings, these models should be interpreted with caution given the inclusion of the same variables in both the dependent and independent variables. Additionally, the factors included in our sensitivity analyses are only a small selection of those that could potentially impact plea discounts, and there is an ever-present possibility for unmeasured characteristics to confound our observed relationships.

Lastly, the size of our plea-bargaining sample was sufficiently large for aggregate tests, but there were small sample sizes for certain combinations of demographic characteristics. As such, it remains possible that our observed interaction effects were at least partly influenced by this small number of observations. Once again, sensitivity analyses using the natural log of the plea discount estimates returned substantively similar results, suggesting that these findings were robust. However, limited samples of demographic subgroups may still be nonrepresentative of the population of these subgroups more broadly. Relatedly, our plea-bargaining sample was drawn from a single mid-sized jurisdiction in Virginia. It is not clear whether these results would generalize to other jurisdictions in Virginia, or other states in general.

Despite these limitations, this study provides an important contribution to the literature on guilty pleas. Our results suggest that plea discounts are influenced by case evidence to a limited degree, but that the influence of non-legal characteristics is both large and robust. In other words, evidence strength does not appear to be the primary driver of the plea discount. It is not clear what mechanisms lead to these outcomes, but theories such as the “shadow of the trial” and focal concerns provide important insight. It could be that legal participants act on both rational considerations, such as the likelihood of conviction at trial based on the strength of the evidence, and biased considerations, such as the perceived dangerousness of a defendant based on their race/ethnicity and sex. Alternatively, it could be that each of these considerations become components of the perceived convictability of a case (see generally Albonetti 1987; Frohmann 1997). More research is certainly needed, but given how fundamental the plea discount is to current criminal justice practice, it is important to start questioning just who this discount is really for.