1 Introduction

One of the most fundamental principles relating to the punishment of criminals is the idea that punishments should be proportional to crimes. In primitive societies this generally meant retaliation according to the principle of lex talionis, which is enshrined in various ancient legal and religious traditions, including the Hammurabi Code, the Old Testament (“eye for eye, tooth for tooth”), and Roman Law.Footnote 1 In addition, the great American jurist and legal scholar Oliver Wendell Holmes argued that many modern-day legal principles had their origins in primitive notions of vengeance, though they have been transmuted into more “civilized” concepts such as the desire to impose punishments that “fit” the crime. (According to Holmes (1881 [1963], p. 39), “fitness” of punishments to crimes is just “vengeance in disguise”.) For example, the prominent English legal scholar H.L.A. Hart notes that modern penal practices embody the idea that “at the sentence stage, the punishment must bear some sort of relationship to the act: it must in some sense ‘fit’ it or be ‘proportionate’ to it” (Hart 1968, p. 160).

In terms of how these ideas enter actual sentencing practices, Dawson (1969, p. 201) notes that “there is judicial resistance to imposition of mandatory maximum sentences that seem unduly long in relation to the circumstances of the case,” while Hart (1968, p. 164) observes that it is through the discretion of judges that “the ideas of fitness and proportionality have had their fullest play.” Mermin (1982, p. 54) similarly observes “that when penalties get too severe, obstacles may be created by prosecutors, juries, and judges.” To this end, prosecutors can use their discretion to reduce charges or selectively choose not to prosecute at all, judges can exercise sentencing discretion (restricted somewhat by sentencing guidelines in the United States),Footnote 2 and juries can exercise nullification.Footnote 3

From a theoretical perspective it is interesting to ask whether the idea of proportionality, either in ancient or modern times, is meant to be a norm in and of itself, or is a policy in service of some other norm, such as crime prevention. The idea of vengeance, or its modern counterpart of retribution or fitness, suggests the former,Footnote 4 but it may also be true that under certain conditions proportionality is justifiable on efficiency grounds. (For example, in a world of certain enforcement, setting punishments equal to harms also achieves optimal deterrence.) This question turns out to be important for thinking about the relationship between the economic theory of crime as originally formulated by Becker (1968), and early writers on crime and punishment—specifically, Montesquieu (1748 [1977]), Beccaria (1764 [1986]), and Bentham (1780 [1970])—whose theories are often pointed to as precursors of that theory. All of these writers emphasized the desirability of maintaining proportionality between crimes and punishments. For example, Montesquieu (1748 [1977]) wrote that

It is a great abuse amongst us to subject to the same punishment a person that only robs on the high-way, and another that robs and murders. Obvious it is that for the public security some difference should be made in the punishment (p. 161).

Likewise, Beccaria (1764 [1986]) argued that

the obstacles that restrain men from committing crimes should be stronger according to the degree that such misdeeds are contrary to the public good and according to the motives which lead people to crimes. Thus, there must be a proportion between crimes and punishments (p. 14).

Finally, Bentham (1780 [1970]) observed that “The greater the mischief of the offense, the greater is the expense, which it may be worthwhile to be at, in the way of punishment” (p. 168).

All three authors, however, elaborated on this principle by arguing that it can be justified, at least in part, as being necessary to discourage offenders from committing more serious crimes—that is, to achieve marginal deterrence (Stigler 1970). Thus, in Montesquieu we read that

It is essential that there should be a certain proportion in punishments, because it is essential that a great crime should be avoided rather than a lesser one, and that which is more pernicious to society rather than that which is less (p. 161).

Likewise, Beccaria noted that

If an equal punishment is meted out to two crimes that offend society unequally, then men find no stronger obstacle standing in the way of committing the more serious crime if it holds a greater advantage (p. 16).

Bentham similarly observed that “When two offenses come in competition, the punishment for the greater offense must be sufficient to induce a man to prefer the less” (p. 168).

These arguments suggest a recognition on the part of these authors that proportionality is not merely an end in itself but is also instrumental in reducing crime.

The purpose of this essay is to examine more specifically the question of how well the idea of proportionality of punishment as espoused by these philosophers conforms to the policy prescriptions of the economic model. The answer, it turns out, depends both on how proportionality is defined, and on the particular assumptions one makes regarding the structure of the law enforcement regime. The remainder of the essay explores these issues. Section 2 begins by examining the compatibility of proportionality with the economic model, first by offering different definitions of proportionality as applied to punishment, and then by comparing those definitions with prescriptions that emerge from the economic model under different enforcement scenarios. Section 3 then explicitly takes up the question of whether proportionality is necessary to achieve marginal deterrence. Finally, Sect. 4 offers concluding remarks.

2 Proportionality of punishments

It is acknowledged by Becker and other contributors to the modern economic theory of crime that the above-quoted authors anticipated (or inspired) the economic model in important ways. Although the theories of these writers were mostly informal, they clearly thought carefully about how criminal law could affect the decisions of rational criminals, and what this view implied about optimal punishment policies. This utilitarian, or instrumental, approach to the subject is obviously compatible with the economic approach. Given this concordance, the question we are interested in answering is whether the formal policy prescriptions that have come out of the economic model since Becker correspond in any way to the policies espoused by the above authors. I will focus particularly on proportionality between punishments and crimes—what I will call the “proportionality norm.”Footnote 5 Whether or not there is a correspondence between the proportionality norm and the prescriptions of the economic model turns out to depend on the specific way that we conceptualize “proportionality.” There would seem to be at least four possibilities.

Proportionality concept 1: Punishments should equal harms

This interpretation supposes that there should be a direct correspondence between individual punishments and harms. As Adelstein (1981, p. 7) notes in referring to the Anglo-American criminal process, “Over the years, ours has been a legal order of retributive punishment tempered by the norm of proportionality, one which seeks to exact an eye, but only that, for an eye.” In this respect, Hart (1968, p. 161) notes that “In its crudest form [proportionality] is the notion that what the criminal has done should be done to him.” However, he quickly qualifies this literal interpretation as often leading to impractical, absurd, or barbaric punishments that modern society has eschewed. Instead, the idea of “equating” punishments to crimes is reflected more generally in the aversion that most people have to what they perceive as “excessive” punishments in relation to the perceived harmfulness of a particular crime. Such thinking, for example, is undoubtedly the motivation behind recent proposals to roll back “three-strikes” laws, which had subjected serial offenders to very harsh punishments for relatively minor offenses, as well as efforts in many states to decriminalize use of certain drugs not deemed to be particularly harmful.

Formally, this first conception of proportionality implies that s i  = h i for all i, where s i is the sanction for offense i and h i is the harm that it imposes on society. The measurement problem is overcome in the economic theory by denominating everything in dollar-equivalent terms. Thus, the dollar cost to the offender of sanction s i (whether a dollar fine or the opportunity cost of prison) is equated to the dollar value of the harm that his act imposes on society.

Proportionality concept 2: Punishments should increase with harms

This interpretation of proportionality looks not at punishments for individual crimes but at the relationship between punishment and crimes across the range of offenses and envisions a monotonic relationship. Again, as Hart (1968, p. 162) puts it,

what is required is not some ideally appropriate relationship between a single crime and its punishment, but that on a scale or tariff of punishments and offences, punishments for different crimes should be “proportionate” to the relative wickedness or seriousness of the crime. For though we cannot say how wicked any given crime is, perhaps we can say that one is more wicked than another and we should express this ordinal relation in a corresponding scale of penalties.

Thus, we define a punishment “schedule” such that s = F(h) where F′ > 0. When the functional relationship is linear (if that idea has any meaning in the current context), this conception of proportionality corresponds to the strict mathematical definition; that is, s = θh for some constant θ. Notice that concepts 1 and 2 are therefore equivalent in the special case where θ = 1.

Proportionality concepts 3 and 4: Expected punishments should equal (increase with) harms

These two concepts mirror concepts 1 and 2 except that they explicitly account for the uncertainty of punishment, and that what matters for deterrence is not the actual punishment but the expected punishment, [or the “effective punishment,” as Friedman and Sjostrom (1993) refer to it]. Proportionality now requires that the expected punishment, p i s i , either be set equal to the harm (concept 3), or be increasing in the harm (concept 4), where p i  < 1 is the probability of detection for offense i. Note that under these concepts, there is no necessary relationship between the actual punishment, s, and the harm—it depends on whether (or how) p itself varies with h.

The next question is how well the prescriptions of the economic theory of crime reflect these different concepts of proportionality. In answering this question, I will follow the approach in Polinsky and Shavell (2007), which represents a state-of-the-art refinement of Becker’s original model. In this “BPS model,” as I will call it, an enforcement authority chooses among the various policy variables, consisting of the magnitude and type of punishment (fine and/or prison) and the probability of apprehension, to maximize a welfare function comprising the net harm from crime—i.e., consisting of the gross harm to society less the benefit of the offender—and the cost of enforcement.Footnote 6

How closely the prescriptions of the BPS model correspond to the above concepts of proportionality turns out to depend on the specific enforcement scenario being examined.Footnote 7 I consider two scenarios: in the first, the probability of apprehension is treated as fixed, and in the second, it is a choice variable. I also focus on the case of punishment by fines alone, though I will comment below on the implications of including prison as a possible sanction.

Scenario 1: p is fixed

In this scenario, the enforcement authority chooses the fine, f, to maximize the following welfare function:

$$ \int_{pf}^{{\bar{g}}} {\left( {g - h} \right)z\left( g \right)dg - c\left( p \right)} $$
(1)

taking the probability of apprehension as given. In this expression, z(g) is the distribution of offender gains, where \( g \in \left[ {0,\bar{g}} \right] \), and c(p) is the cost of enforcement, which is fixed in the current scenario. It turns out that the optimal fine in this case is given byFootnote 8

$$ f* \, = \, h/p. $$
(2)

This formula achieves optimal deterrence because it forces would-be criminals to internalize the external harm caused by their acts, appropriately scaled to reflect imperfect enforcement, where the scaling factor is the inverse of the probability of apprehension. In this sense, the expected criminal fine, pf, serves exactly the same function as a Pigovian tax or strict tort liability.

With regard to proportionality, note that this formula violates concept 1 because punishments will generally be seen as “excessive” (except in the special case where p = 1). From a deterrence perspective, the need for scaling up of the punishment is clear, but concept 1 is based on an ex post perspective that asks whether the punishment fits the crime after the fact, not whether it optimally deters other criminals. The problem with deterrence in this context is that it punishes offenders not only for their own crimes, but also for the crimes of others (those who were not caught), and this strikes many as unfair.Footnote 9 For example, Adelstein (1981, p. 19) observes that fines “made clearly disproportionate by probability scaling unfairly make specific individuals instruments for the achievement of larger social ends[.]”

The optimal fine in (2) is, however, consistent with concepts 2, 3 and 4. Specifically, the expected fine equals the harm (concept 3), and given a fixed p, both the actual fine and the expected fine are increasing in the harm (concepts 2 and 4). The principal incompatibility of the economic model as currently specified and the proportionality norm therefore seems to center on the need for probability scaling, given that p < 1.

Interestingly, Bentham recognized the need for probability scaling when enforcement is imperfect: “To enable the value of punishment to outweigh that of the profit of the offense, it must be increased, in point of magnitude, in proportion as it falls, in point of certainty” (Bentham 1780 [1970], p. 170). Notice, however, that this statement links punishment not to the harm caused by the offense, but to the profit of the offender. Elsewhere, Bentham also says that “the quantum of punishment must rise with the profit of the offence: ceteris paribus, it must therefore rise with the strength of the temptation” (p. 167). The above quote from Beccaria regarding proportionality likewise noted that punishments should increase both with harm to the public from an offender’s misdeeds, “and according to the motives which lead people to crimes” (1764 [1986], p. 14). These arguments suggest a notion of proportionality that ties punishment to criminal gains rather than harms. In keeping with that idea, suppose that we replace the expression in (2) with

$$ f* \, = \, g/p $$
(3)

where g is the dollar gain (profit) from the crime. This rule is obviously inconsistent with any of the above concepts of proportionality, and is also inconsistent with the economic model of crime, except for the case of crimes that definitely should be deterred. The reason that the formula in (3) achieves complete deterrence is because it forces criminals to disgorge their profits, appropriately scaled, thereby eliminating the prospect of any gain from the act (Hylton 2005).Footnote 10

Both Beccaria and Bentham proceeded from the premise that any offense deserving punishment should be completely deterred, so far as that is possible: “Not merely is it in the common interest that crimes not be committed, but that they be more infrequent in proportion to the harm they cause society” (Beccaria 1764 [1986], p. 14); and “The value of punishment must not be less in any case than what is sufficient to outweigh that of the profit of the offense” (Bentham 1780 (1970), p. 166). This idea is also consistent with popular perceptions of the function of criminal law (specifically, that it is aimed at preventing crime), but it is contrary to one of the key insights gained from the economic approach to crime—namely, the concept of an “efficient crime.” It seems to be the case that none of the philosophers contemplated this idea—and indeed, the notion even seemed troublesome to Stigler (1970, p. 527), who, in his critique of Becker’s model observed that “society has branded the utility derived from such activities as illicit.” In other words, crimes are those actions that society deems undesirable.

Yet, in modern times there are many actions that technically come under the jurisdiction of criminal law but that society does not want to completely deter, only to “regulate,” like driving or the disposal of toxic waste. The problem this creates for formulating an economic theory of crime, however, is that if we begin to pick and choose what acts are definitely undesirable (e.g., murder), and what acts are only undesirable if engaged in “excessively” (e.g., speeding), then we run the risk of assuming our conclusions, and the theory loses all predictive power (Friedman 2000, p. 230). For that reason, the economic model of crime as it is commonly employed does not generally make presuppositions about the overall desirability of those acts that are subject to criminal prosecution,Footnote 11 but instead focuses on how to best internalize their harms.

Scenario 2: p is endogenous

Things change dramatically in the BPS model when the probability of apprehension is treated as a choice variable along with the fine. In that case, the enforcer chooses both f and p to maximize (1), where c′ > 0, c″ ≥ 0. The principal prescription now is that the optimal fine should be “maximal”—that is, it should be set at the highest level that the defendant could feasibly pay. Usually this involves equating it to the latter’s wealth, w. The probability of apprehension is then chosen to maximize (1) subject to f* = w. On reflection, the intuition for this policy is clear—because offenders are assumed to respond to the expected sanction, pf, then as long as the product of p and f is maintained at the desired level, deterrence will be optimal.Footnote 12 Further, because it is costless to increase f but costly to raise p (for example, more police officers need to be hired), the fine should be raised as high as possible, and the probability of apprehension correspondingly lowered, so as to reduce the cost of achieving the desired level of deterrence. It obviously follows that fines should be individual-specific, given that offenders vary in their wealth levels.Footnote 13

It should be clear that this policy violates both concepts 1 and 2 because fines are neither equal nor proportional to harms. Indeed, there is no necessary relationship whatsoever between fines and harms. The optimal policy also violates concept 3, given that it turns out that at the optimum the expected fine should be less than the harm, or p*f* < h.Footnote 14 That is, there is some under-deterrence, even for crimes that are definitely undesirable. The intuitive explanation for this result is that when enforcement is costly, resources should be devoted to apprehension only up to the point where the marginal benefit from crime reduction equals the marginal cost of enforcement. It is true, however, that the optimal probability of apprehension should be increasing in the harm.Footnote 15 Given a maximal fine, this implies that the expected fine is an increasing function of the harm, though the exact relationship will not generally be linear. Proportionality of punishments is therefore only satisfied in the sense of concept 4, which seems fairly far removed from any common sense notion of proportionality.

Finally, consider the situation where sanctions are costly to impose, as in the case of prison. The principal conclusion of the BPS model is that prison should never be used until fines have been employed to the maximum extent possible, and only then if further deterrence is cost-justified. The rationale for this policy is the same as that which explains the optimality of a maximal fine when p is endogenous; namely, that the costless policy variable should be used to its maximum extent before using a costly one. Note that this result implies that the rich and poor should be treated very differently by the criminal justice system. In particular, it effectively allows the rich to “buy their way out of prison” (Lott 1987), which strikes many as grossly unfair, if not unconstitutional according to the Equal Protection Clause of the Fourteenth Amendment. Clearly, the policy has no connection to proportionality with respect to the optimal prison term, given that the length of one’s term, if positive, is dictated by one’s wealth level rather than by the crime one committed.

3 Proportionality and marginal deterrence

As discussed above, all three of the philosophers commented on the necessity of proportional punishments for the purpose of achieving marginal deterrence—that is, of preventing offenders who have the opportunity to commit multiple harmful acts from choosing the more harmful ones. Perhaps consideration of this question will reveal a kind of proportionality lurking within the economic model, despite its focus on deterrence.

Since Stigler (1970) first used the phrase “marginal deterrence” in his commentary on Becker’s original article, several authors have explicitly examined the issue using the BPS framework.Footnote 16 Briefly, the idea of these models is to ask what the optimal enforcement policy should look like when offenders can commit multiple harmful acts. In this context, absolute penalties continue to determine an offender’s decision of whether or not to commit individual acts at all, but relative penalties now also affect his choice among acts. As in the single-act model, however, the optimal policy turns out to depend rather subtly on the assumptions one makes about the enforcement structure and the nature of offender gains. The following is a brief review of the optimal policies under different enforcement scenarios.

Shavell (1992) formally examined the issue of marginal deterrence in the context of the BPS model for the case where an offender can choose between two criminal acts that differ in their harmfulness to society, and where punishment is by fines only. He also distinguished between specific enforcement—that is, where the probability of apprehension can be crime-specific, and general enforcement—where the probability of apprehension is constrained to be the same for all acts (though still chosen optimally). In the case of specific enforcement, he derived the following results. First, the optimal sanctions for both acts are maximal (that is, equal to the offender’s wealth).Footnote 17 Second, the optimal probabilities for the two acts generally differ, reflecting consideration of marginal deterrence, but it is not necessarily the case that the optimal probability is higher for the more harmful act; it depends on the distribution of offender gains. Finally, the expected sanction is less than the harm for all acts (i.e., there is some under-deterrence), reflecting the same intuition as described above for the single-act model.

These results show that, although the idea of marginal deterrence is in fact reflected in the optimal enforcement structure emerging from the BPS model, the need for proportionality of actual punishments as the way to accomplish this goal is not borne out in the model because of the strong tendency toward high-fine, low-probability schemes. Marginal deterrence is instead achieved by adjusting the optimal probabilities of apprehension. Even that strategy, however, does not guarantee a higher probability of apprehension for the more harmful crime—it depends on the distribution of offender benefits across those crimes. In general, therefore, none of the proportionality concepts seem necessary to achieve marginal deterrence in the case of specific enforcement and punishment by fines.

Now consider general enforcement, where the same probability of apprehension is applied to all acts. In that case, Shavell showed that the expected sanction should be set equal to the harm for all acts—that is, pf i  = h i for all i—provided that the wealth constraint is not binding (i.e., that h i /p* ≤ w for all offenders, given p*). Further, since p* is the same for all crimes, it follows that the actual fine increases linearly in harm. Thus, proportionality concepts 2, 3 and 4 are all satisfied in this case. The reason is that the probability of enforcement cannot be tailored to crimes, and so a difference in the sanctions has to do all of the work. This result therefore seems to be in accord with the prescriptions of the philosophers. However, when the wealth of offenders prevents implementation of this efficient punishment scheme, the fine for the more harmful act is maximal (equal to the offender’s wealth), and the expected sanction for the less harmful act is set below the harm in order to maintain marginal deterrence.

Wilde (1992) examined the issue of marginal deterrence for the case of costly sanctions. In general, he showed that when enforcement is specific, both sanctions should be maximal regardless of the harmfulness of acts, whereas when enforcement is general, at least one sanction should be maximal but the other may not be.Footnote 18 However, the less-than-maximal sanction again need not be that intended for the less harmful act. These policies therefore do not seem to bear any clear relation to the various proportionality concepts.

4 Conclusion

It has become customary in the economics-of-crime literature to offer a “tip of the hat” to the eighteenth century philosophers Montesquieu, Beccaria, and Bentham, whose early writings on crime and punishment represent important precursors to the modern economic theory. This reflects a completely justifiable acknowledgement of the inspiration that these writers provided, but it also raises the question about how closely their prescriptions conform to those that have emerged from the economic model since it was formalized, two centuries later, by Becker. This essay has undertaken such an evaluation with specific reference to the goal of maintaining proportionality between punishments and crimes—a value that was espoused by all of the writers and continues to be important in understanding modern criminal justice policy. The question I specifically asked was how well different concepts of proportionality are reflected in the optimal punishment schemes that have emerged from the economic model.

The answer turned out to be “not very well,” primarily as a result of the endogeneity of the probability of apprehension. When punishment is certain, deterrence and proportionality are perfectly compatible, but when punishment becomes uncertain, that compatibility breaks down, resulting in a complex relationship between punishments and harms that depends on the particular assumptions one makes about the enforcement regime. And although the early philosophers (especially Beccaria and Bentham) recognized the importance of uncertain detection, they (understandably) failed to realize the full implications of allowing an optimal choice of that variable, along with the severity of punishment, for optimal enforcement policies. As a result, their proposals do not generally match the optimal policies arising from the pure deterrence model.

Modern day criminal justice policies also depart—in some cases rather dramatically—from the prescriptions of the economic model, and probably for the same reasons. The idea of proportionality—the ex post fitness of punishments to crimes—apparently has a strong popular appeal that trumps concerns about optimal deterrence when the two values are in conflict. In practice, proportionality is most clearly manifested, at least in the United States, at the charging stage in the form of prosecutorial discretion, and at the sentencing stage in the form of judicial discretion. The justification commonly relies on Constitutional principles, particularly the Eighth Amendment’s prohibition on “excessive fines” and “cruel and unusual punishment.”Footnote 19 Legislative efforts to curtail such discretion in the name of crime control—such as the enactment of sentencing guidelines in 1987 and three-strikes laws in the 1990s—have met with considerable resistance and ultimately have been weakened or repealed.

It would appear that when it comes to criminal punishment, people are more concerned about doing justice in the sense of appropriately punishing offenders after the fact than about maintaining optimal deterrence of future crimes. In other words, proportionality is a critical stand-alone value in determining criminal justice policy. This likely reflects the fact that punishment is perceived as imposing actual suffering on real people, whereas deterrence is a purely hypothetical concept whose benefits (and costs) are harder to grasp.