1 Introduction

The theory of incentives shows that providing incentives to agents with different characteristics calls for highly complex contracts. However, empirical and casual observations (see, for instance, Baker et al. 1994; Bhattacharyya and Lafontaine 1995) show that, in many circumstances, contracts are simple in the sense that they are not tailored to agents’ characteristics. For instance, in markets that use franchising (for details see Lafontaine and Slade 1997, 2001), instead of offering contracts tailored to the characteristics of each franchisee, most of the franchisors employ a limited set of contracts, often just two: a business-format franchising and an integrated contract.Footnote 1 Within the first category, different franchisors choose different contract terms, as well as different royalty rates and franchise fees, but any given franchisor offers the same terms to all potential franchisees at a given point in time. Paarsch and Shearer (1999, 2000) study the British Columbia tree-planting industry and find that piece-rate contracts are identical within each tract and, through a structural model that controls for the endogeneity of the piece rate, that the elasticity of worker effort with respect to changes in the piece rate across tracts is positive and large. Using data on young drivers, Chiappori and Salanié (2002) cannot reject the hypothesis that coverage and accident frequency are statistically independent. Finkelstein and Poterba (2004) test for adverse selection in annuity markets in the UK, and do not find adverse selection in coverage, but they do find adverse selection in other dimensions. Finkelstein and McGarry (2006) find little or no evidence of positive correlation between risk types and policy choices. In fact, Salanié (2005, p 474) concludes: “The recent literature provides very strong evidence that contractual forms have large effects on behavior. As the notion that “incentive matters” is one of the central tenets of economists of every persuasion, this should be comforting to the community. On the other hand, it raises an old puzzle: if contractual form matters so much, why do we observe such a prevalence of fairly simple contracts?” Thus, the simplicity of real-life contracts in many situations remains a challenge to incentive theory, despite the fact that important progress has been made on the last decade or so based on the idea that contracts/mechanisms must be robust (see, for instance, Lopomo 2001; Bergemann and Morris 2005; Chassang 2013; Carroll 2015).

Under a set of fairly natural assumptions, we show that we can rationalize simple contracts such that payment schemes are not tailored to agents’ characteristics. In order to study our notion of simplicity, this paper proposes a standard principal-agent model where the agent must choose an action that determines, together with his type (e.g. ability), the probability distribution of a contractible output that has more than two realizations. The principal and the agent are risk neutral; the agent privately learns his type before signing the contract and the principal observes neither the agent’s action nor his type. The agent is subject to limited liability, which limits the principal’s ability to punish the agent for bad performance. Limited liability is prevalent in financial and labor market contracts; an employer is not free to punish poor performance with negative wages, and an entrepreneur cannot be asked to re-pay more than the returns on his venture. Following Innes (1990), we restrict the contract space to monotone contracts; that is, the payment to the agent is non-decreasing with realized output, and the return to the principal is also non-decreasing with it.Footnote 2

A full characterization of the optimal mechanism is difficult, yet we provide sufficient conditions under which this consists of a single contract, and as a result the optimal mechanism exhibits what we call the one-size-fits-all property. This result requires that the action and the agent’s type improve the output distribution in the sense of first-order stochastic dominance (FOSD), and that the agent’s payoff function satisfies the strict complementarity between the action and the agent’s type (SC), as well as the weak-separability property (WSEP). This was first introduced in the literature by Faynzilberg and Kumar (1997, 2000). Furthermore, if the output distribution satisfies the average (over types) monotone likelihood ratio property (EMLRP), the optimal contract is a call-option contract; that is, the contract pays the limited liability for low output realizations, and it pays the output minus a positive face value for high output realizations. This is the same contract as the one derived by Innes (1990). Thus, under EMLRP, the contract is simple not only in the sense defined above, but also in how payments vary with realized output.

WSEP imposes that the marginal rate of substitution between the action and the agent’s type is the same for each possible output realization. In the case of pure moral hazard, Grossman and Hart (1983) argue that a similar condition, known as the linear distribution function, is sufficient for the validity of the first-order approach. Faynzilberg and Kumar (1997, 2000) extend this to the case of moral hazard and adverse selection, and partially characterize the optimal contract with moral hazard, adverse selection and risk averse agents. WSEP allows agents to order different contracts according to the power of incentives in such a way that all types agree on which contracts provide more powerful incentives. Thus, WSEP allows an ordering on the contract space.

To better understand the intuition, assume for the time being that there are only two outcomes, and that the contract offers a fixed payment and a bonus upon success. In order to provide incentives for effort, the principal must provide high-powered incentives. To induce information revelation, higher-power incentives are offered to higher types, since, ceteris paribus, they have an incentive to understate their ability level in order to convince the principal that success is unlikely. Because effort is unobservable, each type benefits from high-powered incentives, hence the fixed payment must decrease with ability. But, under limited liability, the fixed payment has a lower bound on how much it could be decreased. Because the principal wants to minimize the informational and limited-liability rent without sacrificing efficiency too much, she has incentives to lower the fixed payment as much as possible, but then the lower bound forces the same fixed payment to everyone. This makes the contract with highest incentive power more attractive to everyone. Thus, the principal has to choose between a pooling contract that leaves no rent to the least able type, and a menu that pays a fixed payment decreasing in ability and a bonus increasing in it, which leaves positive rents to everyone and sacrifices efficiency. Because effort increases with ability, and a dollar of fixed payment costs the principal more than a dollar in a higher bonus, ultimately the principal prefers pooling types, since more able types choose higher efforts. The presence of multiple outcomes expands the scope for profitable screening; yet, under WSEP, each type evaluates each contract as consisting of a fixed payment plus a bonus that aggregates all possible outcomes and is independent of the agent’s action.

When there are multiple outcomes, the intuition for one-size-fits-all is based on the fact that, for any incentive-compatible menu of contracts, there exists a degenerate contract that is preferred by the principal and chosen by each possible agent’s type. To see this, consider an incentive-compatible menu of contracts and a degenerate menu that has one contract, namely the one with the most powerful incentives among the incentive-compatible contracts, denoted by \(w^{*}\). Because of WSEP, i.e. the cumulative distribution function being such that the marginal rate of substitution between the action and the agent’s type is the same for each possible output realization, everyone agrees on which contract provides the most powerful incentives. Incentive compatibility implies that, conditional on the incentive-compatible action, all agents are weakly better-off under the incentive-compatible contract than under \(w^{*}\). Because the action chosen is assumed to be constant and agents are risk neutral, this means that, for each ability type, expected compensation is higher under the incentive-compatible contract. However, when an agent chooses contract \(w^{*}\), he chooses an action that is at least as large as the incentive-compatible action, since actions and types are complements (condition SC), and \(w^{*}\) provides more powerful incentives. Because higher types and higher actions improve the output distribution in the sense of FOSD, and the contract is such that the return to the principal is non-decreasing with realized output, the fact that agents choose a (weakly) higher action under \(w^{*}\) implies that profits are higher under the degenerate mechanism than under the incentive-compatible mechanism. Furthermore, we show that the optimal action is identical to the one that maximizes the virtual surplus, which entails a downward distortion for each possible type but the highest. The limited-liability constraint impairs the principal’s ability to reduce the informational rent any further. In its absence, the principal could lower the payments in bad states so as to make contracts with different power of incentives more appealing to agents with higher types. This will leave room for separation to be optimal for at least a subset of types. Hence, under limited liability, SC and WSEP offering a degenerate mechanism achieves the dual role of reducing informational rents and increasing efficiency; that is, each type chooses the action that maximizes the virtual surplus.

When we further impose EMLRP, the optimal mechanism offers a single call-option contract to all types; that is, a contract that makes the principal the full residual claimant when the output is low, by paying the agent the lowest possible value allowed by the limited-liability constraint, and it makes the agent the full residual claimant when the output is high, by paying him the realized output minus a positive face value. The face value is chosen so that the participation constraint of the lowest type binds and therefore all other types get a positive informational rent. To reduce the informational rent, the principal would like to set the payment in each state different from that for which the average likelihood ratio is the highest equal to the limited liability and, in order to stop agents from overstating their ability type, the principal is forced to offer the same wage profile to each ability type. However, the monotonicity constraint imposing that the principal’s payoff does not decrease with realized output makes this solution infeasible. Hence, the principal is forced to offer the steepest wage profile satisfying the monotonicity constraints. Because of EMLRP and SC, this contract induces higher types to choose higher actions and, therefore, minimizes informational rents and maximizes efficiency.

These results highlight the fact that, under WSEP, the moral hazard problem is more important than the adverse selection problem, in the sense that the optimal mechanism is such that contract terms are not customized to the agent’s type, yet they are designed to provide incentives for effort. However, the action chosen is different from that under pure moral hazard, since incentive compatibility with respect to ability types imposes constraints on the contract through both the first- and second-order conditions of the agent’s revelation of information problem.

Related Literature. A long-standing literature studies the question of how optimal incentive contracts should look like in different situations. The literature on optimal contracting under pure moral hazard is vast, and that under pure adverse selection is also extensive. For the sake of brevity, we will focus on the most closely related strands, which are: the one dealing with contract simplicity and robustness, the one dealing with moral hazard under limited liability with and without monotonicity constraints, and that dealing with moral hazard and adverse selection.

The literature regarding contract simplicity and robustness has made considerable progress in the last decade or so, and it has mainly focused on the optimality of linear contracts and ex-post implementation. Carroll (2015) shows that the optimal contract under risk neutrality and limited liability is linear when the principal’s objective is to maximize the worst-case expected payoff in a setting where the principal knows only a subset of the actions available to the agent. Under uncertainty about the agent’s technology, the principal knows a lower bound on the agent’s expected payoff, based on the fact that she knows a subset of the actions available to the agent. The way she translates the agent’s lower bound into a lower bound for his payoff is by a linear sharing relationship, since this is tight in each instance. Holmstrom and Milgrom (1987) demonstrate that linear contracts are robust to aggregation over time–the optimal contract is a linear function of the endpoint. Carroll and Meng (2016) show that if the principal has knowledge of a lower bound, but not of an upper bound on the shocks, linear contracts are reliable because they give the same incentives for effort at every point along the contract. Balmaceda et al. (2016) also study robustness by quantifying welfare loss as the ratio between the first-best social welfare and that arising from the principal’s optimal pay-for-performance contract. They look at the performance of linear contracts in terms of this metric and show that linear contracts attain the lower bound, which is consistent with the Carroll’s (2015) result. Chassang (2013) also studies a principal-agent model with limited liability, moral hazard and adverse selection in a dynamic context. He calibrates the dynamic contract to a simple benchmark that satisfies three properties: the principal is guaranteed a positive expected payoff, is renegotiation-proof, and the contract satisfies an efficiency bound which is sufficiently tight to imply the max-min optimality of linear contracts. However, this benchmark is not feasible because it does not satisfy limited liability. He then provides a class of dynamic limited-liability contracts that satisfy efficiency properties for a very broad class of stochastic environments and are free from details. By contrast, in our setting, limited liability is crucial for the simplicity, since it is the lower bound on payments that limited liability imposes that which makes pooling optimal under either a two-outcome setting or the assumptions adopted in this paper. In a sense, this robustness literature deals with a different kind of simplicity related to linearity, a not type-independent contracts. Yet, it is related to the fact that we also find that the optimal type-independent contract is simple when EMLRP is assumed. Namely, a single call-option contract is optimal.

Lopomo (1998, 2001) ask why is it that the English auction is often used to determine the terms of trade when the owner of a single object faces a number of potential buyers. He argues in favor of it due to its simplicity and robustness. Lopomo (1998) shows that the perfect Bayesian equilibrium with undominated strategies of the English auction maximizes the seller’s expected revenue among all open bidding procedures in which the buyers at any stage during the auction have only three options: they can purchase the object at the current asking price, remain in the auction by declaring their willingness to pay the current bid, or drop out irrevocably. Lopomo (2001) argues that English auctions are optimal when considering all posterior-implementable outcome functions among all equilibrium outcomes of selling procedures, which satisfy a no-regret condition: each buyer has no incentive to revise his decision after observing his opponents’ behavior. Bergemann and Morris (2005) study robust mechanisms in the sense that ex-post implementation is equivalent to interim (or Bayesian) implementation for all possible types. The robustness here regards the fact that Bayesian implementation requires too much common knowledge, while ex-post implementation does not. They show that this kind of robustness requires quasi-linear preferences without restrictions on transfers, and that the principal implements a function and not a correspondence. Because our model considers limited liability, their result does not apply to our setting. By contrast, our simplicity is derived under Bayesian implementation and common knowledge, yet in order to do so, our problem needs more structure than theirs, and as such it is less general. In addition, the notion of simplicity in our setting is different and it relates to type-independence more than implementation issues.

With regard to the optimal contracting under moral hazard, adverse selection and risk neutrality, the most closely related papers are: Demougin (1989), Guesnerie et al. (1989), Caillaud et al. (1992), Guesnerie and Laffont (1984), McAfee and McMillan (1986, 1987), Laffont and Tirole (1986) and Melumad and Reichelstein (1989). They study the value of information in agency when there is unlimited liability. These papers study under what conditions an incentive-compatible and individually-rational allocation can be implemented by a contract that is type-independent and there is no efficiency loss from adding noise to the production technology. The first three papers consider the moving support case, while the fourth studies the case of fixed support. The main assumption in these papers is that the noise in the production technology is independent of the agent’s type; they study noisy hidden information models. The main result of this literature is that, in most such models, the principal can reach the same utility as in the absence of noise. Guesnerie and Laffont (1984) show that the optimal mechanism is to offer the same contract to everyone when the first-best allocation is decreasing in the agent’s type, and incentive compatibility requires the opposite. This is known as non-responsiveness. However, the reason why the optimal mechanism exhibits the one-size-fits-all property here has nothing to do with non-responsiveness. Laffont and Tirole (1986) also show that bunching may occur when contracts are not restricted and the standard increasing hazard rate assumption is violated.

Escobar and Pulgar (2016) show, in the absence of limited-liability and when the outcome space is binary, that given certain conditions, the optimal contract exhibits the one-size-fits-all property for a subset of the agent’s type set. This stems from the fact that their outcome space is binary and, therefore, all types agree that a contract which pays a higher wage when the high outcome is realized and a lower wage when the low outcome is observed provides stronger incentives for effort and truth-telling. This must not be the case when the outcome space has more than two outcomes. Gottlieb and Moreira (2013) study a principal-agent model with moral hazard and adverse selection. They focus on the case in which types are multidimensional, and show that a positive mass of types with low conditional probabilities of success gets a constant payment and zero rents and there is distortion everywhere.Footnote 3 They also find, under certain assumptions, that the optimal mechanism offers only finitely many contracts. They, as Balmaceda (2011) does, find that when the agent is risk neutral and has limited liability, all agents are offered a single contract.Footnote 4 The main difference stems from the fact that they, like Escobar and Pulgar (2016), focus on the case in which there are only two outcomes. Ollier and Thomas (2013) also study moral hazard and adverse selection with ex-post constraints and, as in Escobar and Pulgar (2016), the outcome space is binary. They, as we do, show that the optimal contract pool types, since there are countervailing forces due to the ex-post constraints. However, considering several outcomes increases the scope for profitable screening, which makes our result more surprising. The countervailing forces in our setting arise from limited liability.

Lewis and Sappington (2000, 2001) study a principal-agent model with adverse selection, moral hazard and limited liability. They use this setting to study how wealth constraints affect the optimal mechanism for selling a project to different bidders. Lewis and Sappington (2000) do so when wealth is publicly observed and Lewis and Sappington (2001) consider the case of private wealth. In the Lewis and Sappington’s (2001) model, which is closest to our setting, the outcome is dichotomic and they allow the agent to post a bond before output is realized. Hence, the principal has two instruments, which are: the payment after success (failure leads to zero outcome in their model) and the bond posted by the agent that results from the agent’s announced initial wealth. This payment cannot exceed the agent’s wealth. When ability and wealth are private information, they show that an agent requires both higher ability and greater wealth to secure a more powerful compensation scheme: more of either of them does not suffice. When wealth is observable they show that. for low-ability agents, the payment after success rises with the ability level, while for high-ability agents, this is independent on the ability level.

Faynzilberg and Kumar (1997, 2000) show that, when the agent is risk averse and there are neither monotonicity nor limited-liability constraints, WSEP is needed to ensure the existence of an optimal mechanism. Furthermore, Faynzilberg and Kumar (1997) show that WSEP and MLRP together result in a monotone mechanism, while Faynzilberg and Kumar (2000) propose a duality method in the spirit of Grossman and Hart (1983) for finding the optimal mechanism and argue that this is type-independent if the technology satisfies strong separability; that is, the marginal impact of the action and the agent’s type are independent of each other, and these are independent of the randomness in the production function. This stems from the fact that the action chosen is type-independent and therefore all types agree on which contract provides stronger incentives for effort. Hence they need a much stronger condition so that agents can order the contract space.

Regarding the pure moral hazard literature, Innes (1990) and Matthews (2001) show the optimality of debt contracts for the principal (call-option contract for the agent) under the assumption of monotone likelihood ratio (MLRP) and contract monotonicity constraints. Poblete and Spulber (2012) show the optimality of the call-option contract under a less stringent condition called monotonicity with respect to the state of the critical ratio, defined as the product of the hazard rate of the state and the ratio of the marginal product of effort to the marginal product of the state. There is another strand that imposes limited liability but not monotonicity constraints. Kim (1997) and Demougin and Fluet (1998) show that the optimal contract entails a bonus only when the outcome is higher than a given threshold, and a fixed wage equal to the limited liability for all outcomes lower than the threshold. Thus, in both cases, the principal leaves a rent to the agent, known as the limited-liability rent. The effort required is downward distorted with respect to the first-best effort in order to reduce the limited-liability rent. We add to this literature the adverse selection dimension, and focus on monotone contracts as defined by Innes (1990).Footnote 5 Under the assumptions made here, we obtain the Innes’s (1990) result in the case when both moral hazard and adverse selection are taken into account.

The rest of the paper is organized as follows. Section 2 presents the model and the main assumptions. Section 3 derives the optimal mechanism with adverse selection and moral hazard. Section 4 discusses the robustness with regard to some of the assumptions. Section 5 concludes with some remarks. All proofs can be found in the Appendix.

2 The model

2.1 Setup

Consider a relationship between a risk-neutral principal and a risk-neutral agent. The agent chooses an action \(a\in A\equiv [0,{\bar{a}}]\) and is characterized by a privately known type (e.g., ability level) indexed by \(\theta \in \Theta \equiv [{\underline{\theta }},{\overline{\theta }}]\). Type \(\theta \) has a cumulative distribution function \(F(\theta )\), which is common knowledge, twice continuously differentiable with the associated density function \(f(\theta )\) and full support. The inverse of the hazard function \(H(\theta )\equiv \frac{1-F(\theta )}{f(\theta )}\) is assumed to be monotonically decreasing, as is the standard in the literature.

The agent’s action and type jointly determine the output distribution, which can take \(n+1\) possible realizations, \(0\le y^{0}<y^{1}<\cdots <y^{n}\). Given action a and type \(\theta \), output \(y^{i}\), \(i\in I\equiv \{0,\ldots ,n\}\), occurs with probability \(p^{i}(\theta ,a)\in [0,1]\), with \(\sum \nolimits _{i=0}^{n} p^{i}(\theta ,a)=1\).

The agent’s cost of action \(a\in A\) is c(a), with: (i) \(c(a')\ge c(a)\) for all \(a'\ge a\); (ii) \(c(\cdot )\) is a twice continuously differentiable and strictly convex function for all \(a\in A\); and (iii) \(c_a(0)=c(0)=0\).Footnote 6

A contract is given by the payment scheme: \(w\equiv ( w^{0},\ldots ,w^{n})\), where \(w^{i}\) is the payment when \(y^{i}\) is realized. Payments are restricted by a limited-liability constraint that prevents the principal from paying the agent a wage lower than \(L\ge 0\) in any state, with \(L\le y_{0}\). This constraint can be thought of as the case in which the agent owns no assets at the time of contracting. Hence,

figure a

In addition, contracts are restricted to satisfy Innes’s (1990) monotonicity constraints, as follows:

figure b

and

figure c

Restriction MONA must be satisfied if the agent can costlessly reduce the output or if the principal can secretly borrow money from an outsider to inflate output. MONP must be satisfied if the principal can costlessly reduce profits. In the case with two possible outcomes MONP can be discarded since optimality requires this to be the case. Otherwise, the principal will offer a flat contract that induces action zero. Finally, contracts are restricted to be piecewise continuously differentiable in \(\theta \).

Define the contract space as:

$$\begin{aligned} \mathcal {C}=\{w\in \mathfrak {R}^{n}|w\;\mathrm { satisfies }\;\mathbf{LL }, \mathbf{MONA }\;\mathrm { and }\;\mathbf{MONP }\}. \end{aligned}$$

Finally, the agent has a reservation utility equal to zero.

2.2 Main assumptions

[FOSD]:

\(\sum \nolimits _{i>i'}p_{a}^{i}(\theta ,a)\ge 0\) and \(\sum \nolimits _{i>i'}p_{\theta }^{i}(\theta ,a)\ge 0\) for all \(i'\in I\) and all \((a,\theta )\in A\times \Theta \).

This implies that higher actions as well as higher types increase, ceteris paribus, the agent’s expected output, his expected compensation when the mechanism satisfies MONA, as well as his expected profits when the mechanism satisfies MONP.

[CP]:

(i) The upper cumulative probability distributions \(\sum \nolimits _{i>i'}p^{i}(\theta ,a)\) are strictly concave in \((a,\theta )\in A\times \Theta ,\;\forall i'\in I\setminus \{n\}\); (ii) \(p^{i}(\theta ,a)\) is thrice continuously differentiable for all \((a,\theta )\in A\times \Theta \), and \(p^{i}(\theta ,a)>0,\;\forall (a,\theta )\in A\times \Theta \) and \(\forall i\in I\); and (iii) \(\lim _{a\rightarrow 0}\sum \nolimits _{i}p^{i}_{a}(\theta ,a)y^{i}-c_{a}(a)>0\) and \(\lim _{a\rightarrow {\bar{a}}}\sum \nolimits _{i}p^{i}_{a}(\theta ,a)y^{i}-c_{a}(a)<0\), \(\forall \theta \in \Theta \).

This is meant to ensure that the first-best action is unique and belongs to the interior. This, together with the fact that contracts satisfy MONA, ensures that the locally optimal action is unique and that this is globally optimal for all \((\theta , w)\in \Theta \times \mathcal {C}\).

[SC]:

\(\sum \nolimits _{i>i'}p_{\theta a}^{i}(\theta ,a)\ge 0\) for all \(i'\in I\) and all \((a,\theta )\in A\times \Theta \).Footnote 7

This assumption implies that, when a increases, the marginal decrease in the cumulative probability distribution increases with type. This assumption ensures that the optimal action increases with ability; therefore, even when each type is offered the same contract, the more able types choose a higher action.

Following Faynzilberg and Kumar (1997), we will also consider the possibility that the technology satisfies weak separability between \((\theta ,a)\) and the random component of the output, as follows:

[WSEP]:

For any \(a,a'\in A\) with \(a\ne a'\) and \(\theta ,\theta '\in \Theta \) with \(\theta \ne \theta '\), the following holds:

$$\begin{aligned} \frac{p^{i}(\theta ,a')-p^{i}(\theta ,a)}{p^{i}(\theta ',a)-p^{i}(\theta ,a)} \;\;\text {is independent of}\; i. \end{aligned}$$

Faynzilberg and Kumar (1997) show that this implies that the probability that outcome \(y^i\) occurs can be written as follows:

$$\begin{aligned} p^{i}(\theta ,a)=(q^{i}-r^{i})g(\theta ,a)+r^{i} \end{aligned}$$

where \(q^{i}\in [0,1]\), \(r^{i}\in [0,1]\), \(\sum \nolimits _{i}^{n}r^{i}=1\) and \(\sum \nolimits _{i=1}^{n}q^{i}=1\). This assumption and the CP, SC and FOSD will be sufficient for the optimal mechanism to be type-independent or to satisfy the one-size-fits-all property.

[EMLRP]:

For any \(a'>a\), \(\frac{\int _{\theta \in \Theta }p^{i}(\theta ,a')dF(\theta )}{\int _{\theta \in \Theta }p^{i}(\theta ,a)dF(\theta )}\) increases with i.

This assumption implies that any principal who has an aggregated-over-types payoff function that is increasing in the project’s outcome prefers the stochastic distribution of returns induced by higher actions. This ensures that the optimal type-independent mechanism is option-like, as is the case in the pure moral hazard contract under standard MLRP (see Appendix  B for details).Footnote 8

2.3 The First Best

In this sub-section we study the case in which action a is contractible and the agent’s type \(\theta \) is common knowledge.

Let \(y\equiv (y^{0},\ldots ,y^{n})\) and \(p(\theta ,a)\equiv (p^{0}(\theta ,a),\ldots ,p^{n}(\theta ,a))\). The total surplus from a type-\(\theta \) agent when he chooses action a as inner product is given by: \(S(\theta ,a) \equiv p(\theta ,a)y-c(a)\). Then, the first-best action, denoted by \(a^{**}(\theta )\), is the solution to the following problem:

$$\begin{aligned} \max _{a'\in A} S(\theta ,a'), \end{aligned}$$

which entails the following first-order condition:

$$\begin{aligned} \sum \nolimits _{i}p^{i}_{a}(\theta ,a')y^{i}-c_{a}(a')=0. \end{aligned}$$

Because of assumptions FOSD and CP, the first-order condition is necessary and sufficient and the first-best action profile belongs to the interior. Thus, the first-best surplus is given by: \(S(\theta ,a^{**}(\theta ))\). Because \(y>0\), \(S(\theta ,a^{**}(\theta ))\ge 0\).

Assumption SC implies that \(S(\theta ,a)\) is supermodular in \((\theta ,a)\) and therefore it readily follows from theorem 3 in Edlin and Shannon (1998) that for any \(\theta '>\theta \), \(a^{**}(\theta ')> a^{**}(\theta )\) and \(S(\theta ',a^{**}(\theta '))> S(\theta ,a^{**}(\theta ))\).Footnote 9 Thus, the first-best action is increasing with the agent’s type and total surplus rises with the agent’s type when FOSD holds.

One contract that implements this action promises to pay the agent \(c(a^{**}(\theta ))+L\) if he delivers \(a^{**}(\theta )\), and it pays L otherwise. The agent weakly prefers to deliver \(a^{**}(\theta )\) in this setting, and so the principal’s ideal outcome ensues. Because the reservation utility is zero and \(L\ge 0\), this ensures that the agent is willing to participate.

This contract is not incentive compatible–every agent will choose to deliver zero effort since this will ensure a payment equal to L ,and because compensation is not contingent on realized output, the agent has neither incentive to exert effort nor to truthfully reveal his type.

3 Moral hazard and adverse selection

The principal’s goal here is twofold: on the one hand, she has to provide the agent with incentives to choose the desired action, and on the other, she has to induce the agent to truthfully reveal his type. In short, any offer \((w(\theta ),a(\theta ))\) must be incentive compatible; i.e. a \(\theta \)-type agent prefers contract \(w(\theta )\) to any other contract, and is obedient in the sense that he chooses the action prescribed by the principal \(a(\theta )\).

An agent of type \(\theta \in \Theta \), when faced with contract \(w\in \mathcal {C}\) and chooses action \(a\in A\), has an expected utility given by:

$$\begin{aligned} U( \theta ,w,a) \equiv p(\theta ,a)w-c(a), \end{aligned}$$
(1)

and the principal has an expected utility given by:

$$\begin{aligned} V(\theta ,w,a)\equiv p(\theta ,a)(y-w). \end{aligned}$$

By the revelation principle, a direct mechanism \((w(\theta ),a(\theta )):\Theta \rightarrow \mathcal {C}\times A\) is incentive-compatible if and only if, for all \(\theta \in \Theta \),

figure d

The mechanism satisfies individual rationality if and only if, for all \(\theta \in \Theta \),

figure e

Constraint (IC) is the incentive-compatibility constraint. This states that a type-\(\theta \) agent is better off announcing his true type, receiving the allocation \(w(\theta )\) and choosing action \(a(\theta )\), than announcing a different type \(\theta '\), receiving the allocation \(w(\theta ')\) and choosing any other action profile \(a'\in A\). Because the action and the type are not observable, the agent can deviate by choosing any action/type he wishes and not be detected. Constraint (IR) is the standard individual rationality constraint establishing that, on the equilibrium path, each ability type prefers participating to staying out.

We say that action a is implementable if there exists a contract \(w\in \mathcal {C}\), such that (wa) is incentive-compatible. If w is such that \(w(\theta )=w(\underline{\theta })\) for all \(\theta \in \Theta \), the contract satisfies the one-size-fits-all property at \(\Theta \), and a is implementable by a one-size-fits-all contract at \(\Theta \). Define \(\alpha (\theta ,\theta ')\in \mathop {\text {argmax}}\nolimits _{a'\in A} U(\theta , w(\theta '),a')\) and denote \(\alpha (\theta ,\theta )\) by \(a(\theta )\).

Proposition 1

Let \((w(\theta ),a(\theta )):\Theta \rightarrow \mathcal {C}\times A\) with \(a(\theta )>0\). Then \((w(\theta ),a(\theta ))\) is incentive compatible if and only if:

  1. (i)

    For all \(\theta \in \Theta \), \(a(\theta )\) is differentiable almost everywhere, and satisfies the follows first-order condition:

    $$\begin{aligned} U_{a}(\theta , w(\theta ),a(\theta ))=0. \end{aligned}$$
    (2)
  2. (ii)

    For all \(\theta ,\theta '\in \Theta \),

    $$\begin{aligned} U_{aa}(\theta , w(\theta ),a(\theta ))\le 0. \end{aligned}$$
    (3)
  3. (iii)

    For almost all \(\theta \in \Theta \),

    $$\begin{aligned} \sum \nolimits _{i}p^{i}(\theta ,a(\theta ))\frac{d w^{i}(\theta )}{d \theta }=0 \end{aligned}$$
    (4)
  4. (iv)

    For almost all \(\theta \in \Theta \),

    $$\begin{aligned}&\sum \nolimits _{i}p^{i}_{\theta }(\theta ,a(\theta ))\frac{d w^{i}(\theta )}{d \theta }\Big ( \sum \nolimits _{i}p^{i}_{aa}(\theta ,a(\theta ))w^{i}(\theta )- c_{aa}(a(\theta ))\Big ) \nonumber \\&\quad -\sum \nolimits _{i}p^{i}_{a\theta }(\theta ,a(\theta )) w^{i}(\theta )\sum \nolimits _{i}p^{i}_{a}(\theta ,a(\theta ))\frac{dw^{i}(\theta )}{d\theta }\le 0. \end{aligned}$$
    (5)

Equation (2) is the first-order condition for the action when the agent truthfully reveals his type by choosing \(w(\theta )\). Equation (3) ensures that the agent’s utility is concave in a, for all \(\theta \in \Theta \). Equation (4) is the first-order condition for the agent’s revelation of information problem, taking into account that, upon a deviation, the agent will choose the action that maximizes his expected utility. This is also evaluated at the local optimal for \((\theta ',a')\). Equation (5) is the local second-order condition to the agent’s revelation problem. This ensures the concavity of the agent’s utility in \((\theta ',w(\theta '),a')\), evaluated at the local optimal. This, together with (2) and (3), is sufficient for global incentive compatibility. This constraint restricts the kind of incentive-compatible mechanisms that the principal can use, and is equivalent to what the literature generally calls the monotonicity constraint. For instance, in the Laffont and Tirole’s (1986) model, this entails a rising quantity and transfer with the agent’s type. However, the condition here is weaker in general, since it allows decreasing as well as increasing payments with the agent’s type. Thus, while imposing that payments are non-decreasing with respect to \(\theta \) does yield sufficiency, it does so with loss of generality.Footnote 10 Observe that this holds when the contract is type-independent or satisfies the one-size-fits-all property. In addition, observe that conditions MONA and CP ensure that the agent’s utility function is strictly concave in a for all \(\theta ,\theta '\in \Theta \), and that \(\alpha (\theta ,\theta ')\) belongs to the interior of A. This implies that the locally optimal action for any \(\theta ,\theta '\in \Theta \) is globally optimal.

The key in the implementability result above is Eq. (4), which states that, in any incentive-compatible mechanism, conditional on the incentive-compatible action being chosen, the expected compensation could not vary with the agent’s reported type. This stems from the fact that both the principal and the agent are risk-neutral. Therefore, conditional on any given action, they both only care about the expected compensation and, ceteris paribus, each agent type prefers the contract with the highest expected compensation, and the principal the contract with the lowest expected compensation.Footnote 11 However, this does not rule out menus where different contracts are customized to intervals of different types, since \(w(\theta )\) is piecewise continuously differentiable. For instance, the principal could offer two contracts, one with low-powered incentives for an interval of low types, and one with high-powered incentives for an interval of high types, where high-powered means that payments increase more with output realizations than in the low-powered contract. However, we will show that, when the probability function satisfies WSEP, jumps in \(w(\theta )\) cannot be optimal.

Before characterizing the optimal mechanism, it is worthwhile to briefly discuss the dimensionality of the output space. If we were to restrict the output space to only two outputs, the only way to provide incentives will be by means of paying a high payment when the higher output is observed and a low payment when the lower output is observed. When an increase in \((\theta ,a)\) increases the probability that the high output is observed, which is a very natural ordering, a greater high payment and a lower low payment will, due to risk neutrality, be preferred by each possible type. Hence, every possible type agrees on which contract provides stronger incentives. When the output space is greater than two, this is not sufficient,since different \((\theta ,a)\) give rise to distribution functions in which the payment difference for different output levels \(w^{i+1}-w^{i}\) will provide more powerful incentives to different types and, therefore, the types will evaluate the appeal of different contracts differently. We will show that, together, FOSD and WSEP provide a natural ordering of the contracts such that different types agree on which contracts provide stronger incentives, which is at the crux of the one-size-fits-all property.

In what follows, we will denote \(dw^{i}(\theta )/d\theta \) by \({\dot{w}}^{i}(\theta )\) and \(d U(\theta )/d\theta \) by \({\dot{U}}(\theta )\). Using the necessary condition in Eq. (4) and the envelope theorem to obtain \({\dot{U}}(\theta )\), while noting that \({\dot{U}}(\theta )\ge 0\), which means that only the lowest type’s participation constraint matters, and integrating by parts, the principal’s problem can be written as follows:

$$\begin{aligned}&\max _{(w,a):\Theta \rightarrow \mathcal {C}\times A} \int _{\theta \in \Theta } \Big (S(\theta ,a(\theta ))-H(\theta ) \sum \nolimits _{i}p^{i}_{\theta }(\theta ,a(\theta ))w^{i}(\theta )\Big )f(\theta )d\theta -U({\underline{\theta }}) \nonumber \\&\text { subject to } \end{aligned}$$
(6)
$$\begin{aligned}&\sum \nolimits _{i}p^{i}_{\theta }(\theta ,a(\theta )){\dot{w}}^{i}(\theta )\Big ( \sum \nolimits _{i}p^{i}_{aa}(\theta ,a(\theta ))w^{i}(\theta )- c_{aa}(a(\theta ))\Big )-\nonumber \\&\sum \nolimits _{i}p^{i}_{a\theta }(\theta ,a(\theta )) w^{i}(\theta )\sum \nolimits _{i}p^{i}_{a}(\theta ,a(\theta )){\dot{w}}^{i}(\theta )\le 0, \; \text {for almost all}\; \theta \in \Theta \end{aligned}$$
(7)
$$\begin{aligned}&\sum \nolimits _{i}p^{i}_{a}(\theta ,a(\theta ))w^{i}(\theta )-c_{a}(a(\theta ))=0,\; \forall \theta \in \Theta , \end{aligned}$$
(8)
$$\begin{aligned}&U({\underline{\theta }})\ge 0, \end{aligned}$$
(9)
$$\begin{aligned}&w^{i}(\theta )\ge L,\; \forall \theta \in \Theta \;\mathrm { and }\; \forall \;i\in I. \end{aligned}$$
(10)

This optimization problem bears some resemblance to the standard mechanism design problem ( Myerson (1982)). The objective function is the virtual surplus; that is, total welfare minus information rents, which are given by \(H(\theta )\sum \nolimits _{i}p^{i}(\theta ,a(\theta ))w^{i}(\theta )\). However, constraint (7) is different from the standard monotonicity constraint due to the multidimensionality of the contract space. This constraint does not entail monotonicity of either the action or the payments. Nonetheless, this does not preclude us from studying the problem of virtual surplus maximization ignoring constraint (7) and then checking if the unconstrained solution satisfies this constraint. If the solution to the relaxed problem satisfies constraint (7), then it is the solution to the principal-agent problem. We will analyze this problem first and then will come back to the virtual surplus maximization, taking into account the local second-order constraint in Eq. (7).

We first derive the optimal mechanism when the local second-order condition is ignored and WSEP is not imposed. Define \(P^{i}(\theta ,a)\equiv \sum \nolimits _{j\ge i+1}p^{i}(\theta ,a)\), \(\xi ^{i}(\theta ,a)\equiv \frac{P^{i}_{a}(\theta ,a)}{P^{i}_{\theta }(\theta ,a)}\) and \(\triangle w^{i}(\theta )\equiv w^{i+1}(\theta )-w^{i}(\theta )\).

Proposition 2

Suppose that CP, FOSD and SC hold, and that \(P_{a\theta }^{i}(\theta ,a)/P_{a}^{i}(\theta ,a)\) is non-increasing with a and non-decreasing with \(\theta \). Then, the mechanism that maximizes the virtual surplus, denoted by \(({\hat{w}}(\theta ),{\hat{a}}(\theta ))\), where \({\hat{w}}(\theta )\) satisfies MONA, MONP and LL, satisfies the following:

  1. (i)

    There exists a threshold \(\xi (\theta )\ge 0\), such that: \(\triangle {\hat{w}}^{i}(\theta )=\triangle y^{i}\) if \(\xi ^{i}(\theta ,{\hat{a}}(\theta ))\ge \xi (\theta )\), and \(\triangle {\hat{w}}^{i}(\theta )=0\) otherwise.

  2. (ii)

    For all \(\theta \in \Theta \setminus {{\bar{\theta }}}\), \({\hat{a}}(\theta )<a^{**}(\theta )\) and \({\hat{a}}({\bar{\theta }})=a^{**}({\bar{\theta }})\)

  3. (iii)

    \({\hat{a}}(\theta )\) rises with \(\theta \).

This result establishes that the mechanism that maximizes the aggregated virtual surplus is such that either \(\triangle {\hat{w}}^{i}(\theta )=0\) or \(\triangle {\hat{w}}^{i}(\theta )=\triangle y^{i}\). This is due to the linearity of the virtual surplus with respect to \(\triangle w^{i}(\theta )\). However, characterizing the optimal payment scheme is not possible without further assumptions. The reason stems from the fact that different types value different payments differently. This means that a certain type may prefer a higher \({\hat{w}}^{i}(\theta )\) and a lower \({\hat{w}}^{i'}(\theta )\), while a different type may prefer the opposite. Thus, not all types value increases in a given payment in the same way and they do not agree on which contract provides more powerful incentives. In fact, the condition that determines whether \(\triangle {\hat{w}}^{i}(\theta )\) is zero or positive depends on whether \(\xi ^{i}(\theta ,{\hat{a}}(\theta ))\) is greater or lower than the threshold \(\xi (\theta )\) and both functions are type-dependent.

This also establishes that the mechanism that maximizes the aggregated virtual surplus is such that the optimal action is downward distorted for all types but the highest one due to limited liability. In addition, the higher the type, the higher the action. The reason for the inefficient action is that the principal’s cost of incentive compatibility, which is given by \(H(\theta )\sum \nolimits _{i}p_{\theta }^{i}(\theta ,a(\theta ))w^{i}(\theta )\), is an increasing function of the action chosen due to: the complementarity between the action and the type, the fact that MONA implies that \(w^{i}(\theta )\) is non-decreasing in i, as well as the increasing hazard rate assumption. Thus, the principal faces a trade-off between efficiency and informational rents, which is solved by a downward distortion of the optimal action so as to reduce the informational rent. Because both the rent and the optimal action are increasing in the agent’s type \(\theta \), the principal requires the first-best action from the highest type.

The non-distortion at the top result seems counter-intuitive at first glance, since it contradicts the fact that, under moral hazard and limited-liability, the effort is downward distorted. This stems from the fact that here incentive compatibility with respect to private information requires higher effort from higher types. Thus, if the principal requires lower effort for the highest type, she has to lower the effort required to all other types. In other words, the principal faces a decreasing marginal cost of effort that trades off efficiency against the information rent; by lowering the effort requested to the highest type the principal lowers the information rent for that type, but also lowers the efficiency of all other types.

Observe that if the mechanism that maximizes the aggregated virtual surplus when constraint (7) is ignored is such that, for some \(\theta \):

$$\begin{aligned}&\sum \nolimits _{i}\Big (p_{\theta }^{i}(\theta ,{\hat{a}}(\theta )) U_{aa}(\theta ,w(\theta ),{\hat{a}}(\theta ))\nonumber \\&\quad -p_{a}^{i}(\theta ,{\hat{a}}(\theta ))\sum \nolimits _{i}p_{a\theta }^{i}(\theta ,{\hat{a}}(\theta )) w^{i}(\theta )\Big ){\dot{w}}^{i}(\theta )\ge 0, \end{aligned}$$
(11)

then constraint (7) must be binding when evaluated at the optimal mechanism for that particular type, and therefore the pointwise optimal mechanism for that type cannot be an optimum.

Observe that, assuming that \(p(\theta ,a)\) satisfies WSEP (i.e., \(p^{i}(\theta ,a)=(q^{i}-r^{i})g(\theta ,a)+r^{i}\) for all i), and after summation-by-parts, the local-second order condition can be written s follows:

$$\begin{aligned}&\Big (U_{aa}(\theta ,w(\theta ),a(\theta ))g_{\theta }(\theta ,a(\theta ))- g_{a}(\theta ,a(\theta ))\sum \nolimits _{i=0}^{n-1}P^{i}_{a\theta }(\theta ,a(\theta ))\triangle w^{i}(\theta )\Big )\times \nonumber \\&\sum \nolimits _{i=0}^{n-1}\sum \nolimits _{j\ge i+1}(q^{j}-r^{j}) \triangle {\dot{w}}^{i}(\theta )\le 0. \end{aligned}$$

The term in parenthesis is negative when we assume MONA, CP and SC. Hence, the sufficient condition reduces to:

$$\begin{aligned} \sum \nolimits _{i=0}^{n-1}\sum \nolimits _{j\ge i+1}(q^{j}-r^{j})\triangle {\dot{w}}^{i}(\theta )\ge 0. \end{aligned}$$

Observe that, in this case, the sufficient condition is independent of the action and the agent’s type other than through the contract chosen. This greatly simplifies the principal’s problem since all types rank the different contracts in the menu in terms of the power of incentives exactly in the same way. This is exactly the same with what happens when there are two outcomes.

Proposition 3

Suppose that \(g_{aaa}(\theta ,a)\le 0\), \(g_{aa\theta }(\theta ,a)\ge 0\), and CP, FOSD, SC and WSEP hold. Then the optimal incentive compatible mechanism, denoted by \((w^{*}(\theta ),a^{*}(\theta ))\), where \(w^{*}(\theta )\) satisfies MONA, MONP and LL, is such that:

  1. (i)

    For all \(\theta \in \Theta \), \({\dot{w}}^{i*}(\theta )=0,\;\forall i\in I\).

  2. (ii)

    For all \(\theta \in \Theta \), \(\triangle w^{i}(\theta )=\triangle w^{i}\in [0,\triangle y^{i}]\) for all \(\theta \in \Theta \).

  3. (iii)

    For all \(\theta \in \Theta \setminus {{\bar{\theta }}}\), \(a^{*}(\theta )={\hat{a}}(\theta )<a^{**}(\theta )\) and \(a^{*}({\bar{\theta }})={\hat{a}}({\bar{\theta }})=a^{**}({\bar{\theta }})\).

  4. (iv)

    For all \(\theta \in \Theta \setminus \{{\underline{\theta }}\}\), \(U(\theta )>L\) and \(U({\underline{\theta }})=L\).

This result shows that, under WSEP, the optimal mechanism satisfies the one-size-fits all property; that is, each type is offered the same contract.Footnote 12 This is at odds with the case in which there is either pure moral hazard or pure adverse selection.

To grasp the intuition, consider any incentive-compatible non-degenerate menu of contracts \(\{w(\theta )\}_{\theta \in \Theta }\), and assume that, instead, the principal offers the degenerate menu with the contract that has the most powerful incentives among all incentive-compatible contracts in the menu \(\{w(\theta )\}_{\theta \in \Theta }\). Denote this contract by \(w^{*}\). Because of WSEP, all types rank contracts in terms of the power of incentives in the same way, since the expected utility can be written as follows:

$$\begin{aligned} U(\theta ,w,a)=w_{0}+g(\theta ,a)\sum \nolimits _{i=0}^{n-1}Q^{i}\triangle w^{i}(\theta )-c(a), \end{aligned}$$

where \(Q^{i}=\sum \nolimits _{j\ge i+1}(q^{j}-r^{j})\).

Holding the incentive-compatible action constant, each type is at least weakly better-off choosing the incentive-compatible contract over \(w^{*}\). Due to risk neutrality, this implies that the expected compensation for each type under the incentive compatible contract is higher than, or equal to that under \(w^{*}\). Thus, holding the action constant, the principal’s payoff is higher under contract \(w^{*}\) than under the corresponding incentive-compatible contract. However, when a type-\(\theta \) agent chooses \(w^{*}\) instead of \(w(\theta )\), he deviates and chooses an action accordingly. Due to the fact that the power of incentives is higher under \(w^{*}\) for each type and that \(p(\theta ,a)\) satisfies SC, a type-\(\theta \) agent chooses an action, under contract \(w^{*}\), that is at least as large as the one he would have chosen under \(w(\theta )\). Because of this, the fact that MONP implies that \(\triangle w^{i}\le \triangle y^{i}\) for all \(i\in I/\{n\}\) and FOSD, the principal’s payoff is higher under \(w^{*}\) than under \(w(\theta )\). In order to minimize the informational rent, the optimal action is downward distorted for everyone but the highest-ability agent. Thus, offering a degenerate menu decreases informational rents and improves efficiency, since no one chooses a strictly lower action than the one they would have chosen in the presence of a non-degenerate incentive-compatible menu.Footnote 13 In fact, the optimal action is identical to the one that maximizes the virtual surplus when WSEP is assumed, since a degenerate menu satisfies the local second-order condition.

Faynzilberg and Kumar (2000) argue that a sufficient condition for a type-independent mechanism in the case of risk aversion without limited liability is that the distribution function satisfies Grossman and Hart’s (1983) linear cumulative distribution (LDFC) in actions and types.Footnote 14 The reason is that, in this case, the incentive-compatible action is type-independent. Here, we show that we do not need such an extreme form of separability in order to obtain that the optimal mechanism satisfies the one-size-fits-all property. Also, as discussed at length by Faynzilberg and Kumar (1997, 2000), the optimal contracting problem with moral hazard, adverse selection and risk averse agents is extremely difficult when the technology is non-separable. In fact, the Faynzilberg and Kumar (1997, 2000) existence result requires separability.Footnote 15

As a last step, we derive the optimal type-independent contract when EMLRP is imposed.

Proposition 4

Suppose that \(g_{aaa}(\theta ,a)\le 0\), \(g_{aa\theta }(\theta ,a)\ge 0\), and CP, FOSD, SC, WSEP and EMLRP hold. Then, there exists an index i, denoted by \(i^{*}\), such that:

  1. (i)

    The optimal contract is an option-call contract of the form \(w^{i*}(\theta )=L\) for all \(i\le i^{*}\) and \(w^{i*}(\theta )=y^{i}-y^{*}\) for all \(i>i^{*}\), where \(y^{*}\) solves the following equation:

    $$\begin{aligned} g(\underline{\theta },a^{*}(\underline{\theta }))\Big (L\sum \nolimits _{i\le i^{*}}Q^{i}+\sum \nolimits _{i> i^{*}}Q^{i}(y^{i}-y^{*})\Big )-c(a^{*}(\underline{\theta }))=0. \end{aligned}$$
  2. (ii)

    \(a^{*}(\theta )\) is non-decreasing with \(\theta \).

This shows that the unique contract within the optimal mechanism is a call-option contract that pays L if the output is lower than or equal to \(y^{i*}\), and pays \(y^{i}-{\bar{y}}^{*}\) whenever the output exceeds \(y^{i*}\). That is, the principal is the full residual claimant for small outputs and the agent is the full residual claimant for high outputs. The value of \({\bar{y}}^{*}\) is set to leave no informational rent to the lowest type; meaning that \({\bar{y}}^{*}\) is set so that \(U({\underline{\theta }})=L\) and the rest receive a positive informational rent. The fact that the optimal contract is type-independent does not imply that the expected output is the same for each ability type. The second-best optimal action grows with the agent’s type \(\theta \), and therefore the greater the type, the greater the expected output and the expected compensation.

This result is due to EMLRP, since the largest output provides more information in favor of a higher action. As a result, in the absence of constraint (7), the optimal payment scheme would be to pay a bonus when the highest output is observed and the limited liability otherwise.Footnote 16 While this solution satisfies MONA, it does not satisfy MONP. Hence, MONP makes this solution infeasible and forces the principal to spread payments through outputs so as to satisfy this monotonicity constraint. Because implementation requires offering the same contract to each type, condition SC ensures that the incentives of all types are aligned in the sense that each type’s action increases as any of the prizes (i.e., \(\triangle w^{i}\)) rises and, conditional on any action, an increase in one of the prizes makes the contract more attractive to each ability type. If this is lacking, the principal could attempt sorting by mean of providing bigger prizes for intermediate outputs, since these will provide agents with less powerful incentives.

4 Discussion

This section briefly discusses the consequences of considering different assumptions.

First, consider the case of type-dependent outside utility. The most plausible case is the one in which U increases with \(\theta \), since the opportunity cost of working elsewhere should be higher for an agent with a higher type, despite the fact that this is not observable. It can be easily shown that this would not change the result as long as the growth rate of U with respect to \(\theta \) does not exceed the informational rent increase with \(\theta \) (i.e., \({\dot{U}}(\theta )\)). The reason stems from the fact that the optimal contract provides each type with a utility higher than their outside option. Otherwise, the principal may not be able to offer higher types a contract that promises, conditional on a given action, a higher expected compensation so as to satisfy his participation constraint without inducing lower types to claim they are a higher type. Hence, incentive compatibility may require to shut down a subset of types. The subset that will be shut down depends on the details of the model and it is hard to predict what would happen. However, complementarity between the action and the type suggests that there is a force towards shutting down low types. However, as shown by Lewis and Sappington (1989), this would also depend on whether the outside option is convex or concave in \(\theta \). In other words, it would also depend on how strong the countervailing forces coming from the outside option are.

Second, consider the case of moving support. We know from the moral hazard literature without limited liability that moving support facilitates implementation. However, under limited liability it is hard to see how this would help, since, as shown by Mirrlees (1975), the moving support assumption together with unbounded utility almost solve the implementation problem. With limited liability, the principal is restricted in her capacity to punish the agent for outputs that cannot come from a combination of a given type with a given action. Hence, despite the fact that there are outputs that are observable only if certain actions are chosen, which facilitates the inference problem, the principal would not be able to punish the agent in such a way that would deter the agent’s misbehavior when outcomes not consistent with the desired actions are observed.

Third, in the absence of monotonicity constraints, as shown in Balmaceda (2011), WSEP is not needed. The reason is that MLRP with respect to both the action and the type, SC , and limited liability imply that it is optimal to pay a bonus only when the highest output is observed. Given that limited liability precludes negative payments, it is optimal to set all other payments equal to the limited liability. Because there is only one optimal instrument that provides incentives –the payment when the highest outcome is observed– all types prefer the contract with the highest payment. This implies that, provided it is optimal to pay a bonus only when the highest outcome is observed, it is not incentive-compatible to offer a menu with more than one contract. In other words, the model behaves as if there were only two outcomes.

Finally, observe that moral hazard is crucial for the one-size-fits-all result. To see this, lets discuss the local implementability conditions when there is adverse selection and no moral hazard; that is, actions are contractible.

Proposition 5

Let \((w(\theta ),a(\theta )):\Theta \rightarrow \mathcal {C}\times A\) with \(a(\theta )>0\) and \(w(\theta )\) piecewise continuously differentiable for all \(\theta \in \Theta \). Then \((w(\theta ),a(\theta ))\) is incentive compatible only if:

$$\begin{aligned} \sum \nolimits _{i}p^{i}(\theta ,a(\theta ))\frac{d w^{i}(\theta )}{d \theta }+\frac{d a(\theta )}{d \theta }\Big (\sum \nolimits _{i}p^{i}_{a}(\theta ,a(\theta ))w^{i}(\theta )- c_{a}(a(\theta ))\Big )=0\qquad \end{aligned}$$
(12)

and

$$\begin{aligned}&\sum \nolimits _{i}p^{i}_{\theta }(\theta ,a(\theta ))\frac{d w^{i}(\theta )}{d \theta }+\frac{d a(\theta )}{d \theta }\sum \nolimits _{i}p^{i}_{a\theta }(\theta ,a(\theta )) w^{i}(\theta )\ge 0. \end{aligned}$$
(13)

In contrast to the necessary condition in Eq. (4) in proposition (1), the necessary condition in Eq. (12) does not require that:

$$\begin{aligned} \sum \nolimits _{i}p^{i}(\theta ,a(\theta ))\frac{d w^{i}(\theta )}{d \theta }=0. \end{aligned}$$

This results from the fact that the principal can use the action profile \(a(\theta )\) as another instrument to induce truth-telling at a lower informational rent, and to deter low types from claiming to be high types when the power of incentives is higher for contracts tailored to high-ability types. In fact, given that the cost of the action is independent of the agent’s type, the principal can implement the first-best action by offering a compensation contract that pays the cost of effort in each state. The agent would be indifferent between all actions and will choose the principal’s preferred action profile. If the cost of the action profile were to depend on the agent’s type, this solution would no longer be possible since, if she were to compensate each type according to the cost of the action taken, each type would claim to be the one with the highest cost.

5 Conclusions

This paper shows that the optimal menu of contracts in the presence of limited liability, monotonicity constraints, risk neutrality, moral hazard and adverse selection has three highly empirically observed properties: (i) when WSEP is satisfied, the menu of contracts exhibits the one-size-fits-all property; that is, contracts are not customized to the agent’s privately know ability; (ii) better agents choose higher actions, are more productive, and have a higher expected compensation (see, for instance, Lazear 2000; Paarsch and Shearer 2000; Seiler 1984 for empirical evidence backing this result); and (ii) when EMLRP is imposed, the unique contract is a call-option contract that makes the agent the residual claimant when output is high, and the principal the residual claimant when output is low.

The fact that the optimal mechanism is such that only one contract is offered has an important practical consequence. Namely, it unburdens an econometrician studying the consequence of optimal contracting from controlling for unobserved heterogeneity. In fact, Chiappori and Salanié (2002) argue that the main concern about testing contract theory is the necessity of adequately controlling for unobserved heterogeneity. Mainly, they argue that if this “is not done properly, then the combination of unobserved heterogeneity and of endogenous matching of agents to contracts is bound to create selection biases on the parameters of interest.” While this concern is valid in many settings, especially in those where risk-aversion plays a role, the result here shows that this is not necessarily the main concern in a setting with risk-neutrality and limited liability, such as financial contracting, procurement, and regulation. Thus, when studying whether observed contracts have the properties predicted by contract theory in a setting likely to satisfy the assumptions above, the need of estimating fixed effect models is of a lesser importance than suggested by Chiappori and Salanié (2002). Furthermore, the one-size-fits-all property provides some theoretical justification for studies concerned with the consequences of optimal contracting on performance that use aggregated data.

Finally, the results presented in this paper provide a rationale for why many financial institutions offer one-size-fits-all credit card interest rates and payday loans, why tax systems do not offer menus of tax schedules where agents are free to choose the schedule that best suits them, and why regulatory agencies do not use menus of contracts that adopt the form predicted by the optimal regulation theory as developed by Laffont and Tirole (1986). In addition, it offers a rationale for taxi-driver contracts when EMLRP is imposed, since taxicab owners are full residual claimants up to a certain amount, which is the same for most taxis, and after that amount is reached, the drivers become full residual claimants.