1 Introduction

Bequests represent both a consequence and a determinant of economic inequality. As such, the taxation of bequests and inheritances is a potentially important component of redistribution policy. Despite the fact that revenues from bequest taxation constitute a small or non-existent proportion of total tax revenues in most countries, optimal bequest taxation is a lively and contentious area of research and policy consideration. For example, the Mirrlees Review (2011), echoing the Meade Report (1978), proposed a cumulative lifetime tax on inheritances as a way to mitigate large differences in economic opportunities at birth. And there has been a resurgence of interest in the optimal taxation of wealth transfers, recently surveyed in Cremer and Pestieau (2011) and Kopczuk (2013).

The optimal tax treatment of bequests remains contentious and depends on the bequest motive, the responsiveness of bequests to taxation and the normative underpinnings for bequest taxation. On the one hand, bequests can be unintentional or accidental, resulting from wealth accumulated for lifetime purposes, and left unspent at death. The purpose can be to self-insure against an uncertain length of life or uncertain health or care expenses, or wealth may be accumulated solely as an end in itself. In either case, taxing an estate consisting of wealth that has simply been left over has no disincentive effect and could be taxed at a very high rate. On the other hand, bequests may be intentional. They may reflect transfers purposely given to one’s heirs (or to charity) either for altruistic motives or from some satisfaction from giving, typically called the joy-of-giving but possibly including morally felt obligations. In either case, taxing them presumably has some discouraging effect. Intentional bequests may also take the form of payment to one’s heirs for services, so-called strategic bequests. These are in principle not different from any other market transaction and could be treated as such for taxation purposes.

The normative basis for the taxation of bequests is controversial. From the point of view of recipients, inheritances represent windfall sources of income that not only provide benefits to them, but do so in a very unequal way. Taxation both serves an equalizing objective and contributes to equality of opportunity. How to treat the donors is another matter, at least if bequests are intentional. Those who, like Kaplow (1998, 2001, 2008), adopt a strict welfarist stance argue on revealed preference grounds that donors must have received a benefit from giving the bequest and that benefit should ‘count’ in the social welfare criterion used by the government. Others, like Hammond (1987) and Mirrlees (2007, 2011), argue that this gives rise to double counting. Since the benefit to recipients from bequests is already counted in social welfare, it is unnecessary to include it again in the guise of benefits to donors.

The recent literature on the optimal taxation of intentional bequests has tended to take the welfarist position and double-count the benefits of bequests (Farhi and Werning 2010, 2013; Brunner and Pech 2012a; Kopczuk 2012, 2013; Piketty and Saez 2013). The typical findings are fairly intuitive. Suppose individuals differ in wage rates and the government redistributes using an optimal income tax. The question is what tax treatment of bequests and inheritances should accompany the income tax. From the point of view of the donor, a bequest is comparable to the purchase of a good. If preferences are weakly separable and if the government implements an optimal nonlinear tax, the Atkinson and Stiglitz (1976) theorem applies. Differential taxation of bequests should not be applied as a device for improving the redistributive capacity of the income tax. (Of course, if the tax mix consists of a uniform commodity tax plus a nonlinear income tax, the commodity tax should apply to bequests given).

This outcome is complicated when the benefit of inheritances to recipients is taken into consideration. For one thing, the value of inheritances to the recipients is a social benefit that is not taken into account by the donors, who attach weight only to the benefits the donations give to themselves. There is thus an externality that calls for a subsidy on bequests. Second, the receipt of inheritances of different sizes yields unequal benefits to the recipients, which can be addressed by making the tax on inheritances progressive. Finally, to the extent that inheritances are correlated with the wage rate of recipients, that constitutes a further argument for taxing them as an indirect way of taxing high-wage persons. The first two influences lead Farhi and Werning (2010) to the finding that bequest taxes should be negative, but progressive. The third increases the optimal tax on bequests, possibly enough to make it positive (Brunner and Pech 2012a). Kopczuk (2013) further argues that if inheritances reduce the labour supply of recipients, this reduces social welfare as long as labour income taxes are positive. This leads to another argument for taxing bequests, making it ambiguous whether bequests should be taxed or subsidized.

These results rely on the double counting of the benefits of bequests. Although the welfarist arguments leading to double counting are on the surface persuasive, especially the revealed preference argument, there are compelling arguments against double counting. The benefit one obtains from the utility of one’s heirs should apply in principle to any form of interdependent utility whether revealed through transfers or not. In a family, presumably each member values the well-being of other members, but there is no suggestion that this multiplicity of utilities be counted in social welfare. The same applies more generally to feelings of altruism or even avarice towards fellow citizens.Footnote 1 Some have noted the analogy between saving for one’s own future and saving for heirs. As Mirrlees (2007) argues, we would not consider counting the utility we obtain now when saving for our future self’s consumption. Some might regard the role of government redistribution as reflecting the altruism of the rich for the poor and internalizing the free riding from private donations.Footnote 2 There is no suggestion that in this case the rich taxpayers’ altruism should be counted as social welfare. Finally, intentional bequests may not give utility to donors at all. They may represent voluntary transfers done out of a sense of obligation, making the donors worse off.

The purpose of this paper is to explore the consequences for optimal bequest taxation of discounting the benefits of bequests to the donors. In the literature, this has been done by analysing the tax treatment of bequests when the benefits of donors are excluded from social welfare, or ‘laundered out’ to use the terminology of Cremer and Pestieau (2006, 2011). The literature has focused on the implications for bequest taxation and has not studied how the tax treatment of donors is affected. This is important because, as Kopczuk (2013) emphasizes, bequest behaviour is very heterogeneous. Failure to take into account of that heterogeneity entails discriminating in favour of non-donors, who are not forgoing any consumption by leaving bequests, relative to donors (McCaffery 1994; Mankiw 2006). Models used in the existing literature to study the effect of laundering out on bequest taxation typically assume that all households have the same preferences for bequests (Cremer and Pestieau 2006, 2011; Mirrlees 2007; Brunner and Pech 2012b). The treatment of donors relative to non-donors then does not arise.

In this paper, we study the implications of not fully counting the benefits of bequests to donors in a simple model with three sources of heterogeneity. Individuals differ in their wage rates, in their preferences for bequests and in the inheritances they receive. We study optimal nonlinear income taxation, the income tax treatment of bequests and the taxation of inheritances. Our model closely follows the recent literature by adopting some important simplifications to facilitate the analysis and the interpretation of results. As in Farhi and Werning (2010), Brunner and Pech (2012a) and Kopczuk (2013), we focus on two generations, parents and children. Each generation has an exogenous distribution of wage rates, and there may be some correlation between parental and child wage rates. Social welfare includes the sum of social utilities of parents and children, where the social utility of parents discounts the utility parents obtain from bequests, thereby capturing both the extreme cases of ‘double counting’ and of zero weight on the joy-of-giving term, as well the various intermediate cases. To be able to separate utility of bequests from utility of own consumption, we assume an additive form of utility function. Heterogeneity of bequest behaviour is captured by assuming that some parents give bequests, while the rest do not.Footnote 3

The government has three policy instruments. It imposes nonlinear income taxes on both parents and children; it chooses a linear tax on inheritances; and it allows a partial income tax credit on bequests. The latter reflects the fact that in the case when the government does not count the utility of bequests to donors, bequests simply reduce donors’ consumption. An income tax credit on bequests allows non-donors and donors to be treated relatively comparably by the income tax system. We assume that apart from the bequest tax credit, the income tax is not conditioned on bequests. Also, the inheritance tax is not related to either parental or child incomes. These are obviously strong assumptions, albeit similar to those used in Farhi and Werning (2010), Brunner and Pech (2012a) and Piketty and Saez (2013), but they simplify the analysis considerably.

We begin by outlining the decision problems of the parents and children and then characterize the government’s problem in Sect. 3. In Sect. 4, we consider the social optimum when the government can observe wage-types of both parents and children, and highlight the externality-correcting and redistributive roles of bequest taxation by restricting the government to use the same policy instruments as in the imperfect-information case. In Sect. 5, we consider the imperfect-information environment. Section 6 discusses some extensions, and Sect. 7 concludes.

2 Household problem

The simplest model is chosen to illustrate the effects. There are two wage-types of parents, \(n_1\) persons with wage \(w_1\) and \(n_2\) with wage \(w_2\), where \(w_2>w_1\) and \(n_1+n_2=1\). A proportion \(d_i\) of type\(-i\)’s are donors, so \(1-d_i\) are non-donors, where we suppose \(1>d_2> d_1>0\) for concreteness. Donor and non-donors differ in their preferences for bequests, for example, because of their views of the role of government in making transfers.

Following Farhi and Werning (2010), we assume that each parent has one child. Each child is endowed with either high or low skills, but not necessarily the same as his parent. In particular, the probability that a child is the same skill type as his parent is given by \(\pi \in (1/2,1]\), where we assume that the same probability \(\pi \) applies to both high- and low-skilled parents. Thus, the skill type of a child is positively correlated with the parent’s type. All children with parents of a given skill type are equally likely to have a donor parent. Therefore, the probability of having a donor parent is simply given by the proportion of donor parents of a given skill type, \(d_i\). For children of non-donor parents, the skill type of their parents is irrelevant. Consequently, there will be six distinct child types: each of the two skill types of children can receive bequests of \(b^1\), \(b^2\) or zero. We can denote a bequest-receiving child’s type by the characteristics \((w_{k},b^i)\) for \(k,i=1,2\), where k is the child’s skill type and i is a donor parent’s skill type. To simplify matters, we focus on only two generations and assume that the children do not make bequests.Footnote 4 We begin by characterizing the donor and non-donor parents’ problems and then turn to the children’s problem.

2.1 Parents’ behaviour

Household utility depends on a private consumption good x, on labour supply \(\ell \), and in the case of donors, on net-of-tax bequests b, where we drop subscripts for the time being and whenever it causes no confusion. Suppose the inheritance tax is proportional at the rate t. Then, net bequests can be written \(b=(1-t)B\), where B is the actual (gross) bequest. Preferences of donors and non-donors take the following quasilinear forms, respectively:

$$\begin{aligned} U(x,b,\ell )=x+f(b)-h(\ell ), \quad u(x,\ell )=x-h(\ell ) \end{aligned}$$
(1)

where f(b) is the utility of bequests (joy-of-giving) applying to donors and \(h(\ell )\) is the disutility of labour supply. The latter satisfies \(h'(\ell )>0\) and \(h''(\ell )>0\). We assume that the utility-of-bequest function satisfies \(f''(b)<0\), with \(f'''(b)\leqslant 0\) and \(f'(\overline{b})=0\) for some \(\overline{b}>0\). That is, there is some maximal value of net bequests b that contributes to the donor’s utility: the f(b) curve is an inverted U-shape with a peak at \(b=\overline{b}\). We also assume that \(f'(0)>1\). This ensures that in the laissez-faire, where the price of bequests is unity, the donor parents will make positive bequests. An example of a functional form satisfying these properties is the quadratic form \(f(b)=2\overline{b}b-b^2\) with \(\overline{b}>1/2\), where \(f'(b)=2(\overline{b}-b)\) so \(f'(\overline{b})=0\), \(f'(0)>1\), and \(f''(b)<0\).

Four points should be noted about the donor’s utility function in (1). First, the additively separable form of \(U(x,b,\ell )\) allows the government to weigh the utility of bequests to donors by less than unity and in the extreme to not count f(b) in the social welfare function. If the utility function took a general non-separable form, it would not be obvious how to specify the net-of-bequests utility of the donors. Second, we assume that donors get utility from the after-tax bequest b. That is, they care about what ends up in the hand of their heirs, rather than the amount they leave before-tax, which is B. This implies that a tax levied on inheritances will affect their behaviour, which would not be the case if B were the argument of \(f(\cdot )\). Third, the quasilinear form of utility implies that there will be no income effect on bequests. While this simplifies things greatly, it is restrictive. Our assumption that \(d_2>d_1\) partly mitigates the absence of income effects by ensuring that the average level of bequests is higher for type-2’s than type-1’s. A main advantage of ignoring income effects is that we can clearly highlight the different roles that bequest taxation plays in correcting the social externality resulting from the government discounting the utility of bequest and redistributing between donor and non-donor parents and inheriting and non-inheriting children. In Sect. 6, we discuss how allowing for income effects does not change these potential roles for bequest taxation, but the problem becomes considerably more complicated. Finally, since x absorbs all income effects, we should include a non-negativity constraint on x. In what follows, we assume that it is never binding, so that choices of b and \(\ell \) are always in the interior.

A household whose wage rate is w earns \(y=w\ell \) and pays income tax T(y). In addition, donors obtain a tax credit at the rate \(\tau \) on their gross bequests. The government can observe a household’s income y and its bequests B, in which case it can condition the tax credit \(\tau \) on the level of bequests as well as on income. For simplicity, we assume that the tax credit rate is proportional, but may differ by income. That is, \(\tau \) can take values \(\tau _1\) and \(\tau _2\) for the two wage-types. The budget constraint for a donor is \(x+B=y-T(y)+\tau B\equiv c+\tau B\) or, using \(b=(1-t)B\),

$$\begin{aligned} x+{b\over 1-t}= c+\tau {b\over 1-t} \end{aligned}$$
(2)

where c is after-tax income, or disposable income, before the tax credit. Note that the income tax function T(y) does not depend on bequests, although the bequest tax credit does. This simplifies the analysis, although it may be restrictive. Note also that the budget constraint (2) assumes that the inheritance tax is paid by the recipient.

It is useful to disaggregate the donor’s problem into two stages. In the first stage, labour is chosen, which determines income \(y=w\ell \) and disposable income \(c=y-T(y)\). In the second stage, the donor chooses how to divide disposable income between x and B, or \(b/(1-t)\). Begin with the second stage.

Stage 2: Donor’s allocation of disposable income

Given y and c and using the utility function (1) and the budget constraint (2), the problem of donor household with wage rate w can be written as follows, where we adopt b as the donor’s choice variable:

$$\begin{aligned} \max _{\{b\}}\ c-{1-\tau \over 1-t} b +f(b)-h\Big ({y\over w}\Big ). \end{aligned}$$

The necessary condition for this problem is

$$\begin{aligned} f'(b)={1-\tau \over 1-t} \end{aligned}$$
(3)

whose solution yields the demand function for net bequests

$$\begin{aligned} b\Big ({1-\tau \over 1-t}\Big ),\quad \mathrm{with}\quad b'(\cdot )={1\over f''(b)}<0. \end{aligned}$$
(4)

Comparative statics of (3) yields:

$$\begin{aligned} b_{\tau }={-1\over 1-t}b'(\cdot )>0,\quad b_t={1-\tau \over (1-t)^2}b'(\cdot )<0. \end{aligned}$$
(5)

where subscripts refer to partial derivatives.

It will be useful in what follows to define the argument of (4) for a donor as follows:

$$\begin{aligned} p\equiv {1-\tau \over 1-t}. \end{aligned}$$
(6)

This can be interpreted as the effective price of net bequests to the donor. We can define the price elasticity of net bequests as

$$\begin{aligned} \epsilon _p\equiv - {p b'(p)\over b(p)}>0 \end{aligned}$$
(7)

where the inequality follows from (4). If \(\epsilon _p<1\), the demand for net bequests is relatively inelastic and vice versa. Differentiating \(\epsilon _p\) with respect to p yields:

$$\begin{aligned} \frac{\mathrm{d} \epsilon _p}{\mathrm{d}p}= -\frac{b'(p)}{b(p)} -\frac{p b''(p)}{b(p)} +\frac{p (b'(p))^2}{(b(p))^2}>0. \end{aligned}$$

Note that if \(\tau =1\) so \(p=0\), we have \(b(0)=\overline{b}\). Also, if \(t=1\), then \(b=(1-t)B=0\) for any B, so the utility of bequests is \(f(0)=0\). The donor faces the cost \((1-\tau )B\) of leaving a bequest, so none will be left, and we would be in a no-bequest world.

Stage 1: Choice of labour supply

When households choose labour supply, they anticipate the choice of net bequests b in the following stage. Given a tax function T(y), the type\(-w\) households solve the following problem:

$$\begin{aligned} \max _{\{y\}}\ c-pb(p) +f(b(p))-h\Big ({y\over w}\Big )\quad \mathrm{s.t.}\quad c=y-T(y). \end{aligned}$$

The solution to this problem gives

$$\begin{aligned} h'\Big ({y\over w}\Big )=\big (1-T'(y)\big )w \end{aligned}$$
(8)

which determines y. Thus, income or labour supply is not affected by the inheritance tax and bequest tax credit. The same result (8) on income applies to both donor and non-donor parents.

In what follows, we assume that the bequest tax credit rate varies with the skill level of the household, while the inheritance tax is constant. Therefore, we can write the price of bequests as \(p_i=(1-\tau _i)/(1-t)\) for \(i=1,2\). Later we consider the consequences of the government differentiating the inheritance tax according to the wage-type of the recipient child.

2.2 Children’s behaviour

Children’s preferences are the same as non-donor parents’ as given in (1). Like the parents, children face a nonlinear income tax schedule. We assume that the government applies separate income tax schedules to the parents and the children and denote the nonlinear income tax schedule facing the children by \(T_c(y)\). The budget constraint for a child with a donor parent of wage-type i is \(x=y-T_c(y)+b(p_i)\) and with a non-donor parent is \(x=y-T_c(y).\)

Given a tax function \(T_c(y)\), a type\(-k\) child with a type\(-i\) donor parent solves the following problem:

$$\begin{aligned} \max _{\{y\}}\ y-T_c(y)+b(p_i) -h\Big ({y\over w_k}\Big ). \end{aligned}$$

The first-order condition for this problem gives

$$\begin{aligned} h'\Big ({y\over w_k}\Big )=(1-T_c'(y))w_k \end{aligned}$$

which determines income \(y_k\). As there are no income effects on labour supply, whether a child has a donor parent or not does not affect their labour supply decision. We consider the implications of income effects for bequest taxation in Sect. 6.

3 The government problem

The government chooses separate nonlinear income tax functions for both the parents and the children, bequest tax credit rates \(\tau _1\) and \(\tau _2\) for the two parent wage-types and an inheritance tax rate t. As usual, we solve this problem using the direct approach, letting the government choose consumption and income for the two parent wage-types, denoted by \(c_i\) and \(y_i\), and the two children wage-types, denoted by \(\overline{c}_k\) and \(\overline{y}_k\), as well as the inheritance tax t and bequest tax credits \(\tau _i\), for \(i,k=1,2\). In fact, since household behaviour and all budget constraints depend only on the net price of bequests \(p_i=(1-\tau _i)/(1-t)\), the inheritance tax t is redundant and could be set at any rate including zero.Footnote 5 We therefore treat \(p_i,\ i=1,2\), as control variables for the government.

The government weights the utility of bequests to donors, f(b), by some parameter \(\delta \in [0,1]\). From the point of view of the government, the utility of a donor household is therefore given by \(x+\delta f(b)-h(\ell )\). Consider a type\(-i\) donor. Using the budget constraint (2) and assuming the donor chooses \(b_i\) optimally, the government’s value of utility of a type\(-i\) donor, referred to hereafter as the social utility of a type\(-i\) donor, can be defined using (6) as follows:

$$\begin{aligned} V^i(c,y,p_i;\delta )\equiv c-p_ib(p_i)+\delta f(b(p_i))-h\Big ({y\over w_i}\Big ). \end{aligned}$$
(9)

Differentiating this, we obtain

$$\begin{aligned} V^i_c=1,\ \ V^i_y=\!-{h'(\ell _i)\over w_i},\ \ V^i_p=-b(p_i)-p_ib'(p_i)\,{+}\,\delta f'(b(p_i))b'(p_i), \ \ V^i_{\delta }=f(b(p_i)). \end{aligned}$$
(10)

Using the expression determining the donor’s optimal choice of bequest, (3), \(V^i_p\) can be rewritten as

$$\begin{aligned} V^i_p=-b(p_i)-(1-\delta )p_ib'(p_i)=-b(p_i)\left( 1-(1-\delta )\epsilon ^i_p \right) \end{aligned}$$
(11)

which is negative provided \(\epsilon ^i_p < 1/(1-\delta )\). Given \(p_i\), the slope of an indifference curve in cy-space is

$$\begin{aligned} {\mathrm{d}c\over \mathrm{d}y}\bigg \vert _{V^i,p_i}=-{V^i_y\over V^i_c}={h'(\ell _i)\over w_i}. \end{aligned}$$

For the type\(-i\) non-donor parent, utility from the government’s point of view is given simply by \(v^i(c,y)\equiv c-h(y/w_i)\), where \(v^i_c=1\) and \(v^i_y= -h'(\ell _i)/ w_i\). Indifference curves have the slope

$$\begin{aligned} {\mathrm{d}c\over \mathrm{d}y}\bigg \vert _{v^i}=-{v_y\over v_c}={h'(\ell _i)\over w_i}. \end{aligned}$$

Thus, the donor and non-donor parents have the same indifference curves in cy-space, so they cannot be separated by the income tax. This is a consequence of utility being quasilinear and simplifies matters considerably although we discuss the implications of relaxing this assumption in Sect. 6. (Recall that we assume that the income tax is not contingent on bequests.) Since \(-w_i \mathrm{d}c/\mathrm{d}y\vert _{V^i,p_i}\) is decreasing in \(w_i\), the Spence–Mirrlees conditions are satisfied, so although donor parents cannot be separated from non-donor parents by a nonlinear income tax, high- and low-wage-types can be.

In the case of heirs, social utility and private utility are identical. The value of utility of a type\(-k\) child with a type\(-i\) donor parent is defined as follows:

$$\begin{aligned} R^{ki}(\overline{c},\overline{y},p_i)\equiv \overline{c}+b(p_i) -h\Big ({\overline{y}\over w_k}\Big ) \end{aligned}$$
(12)

where

$$\begin{aligned} R^{ki}_{\overline{c}}=1, \quad R^{ki}_{\overline{y}}=-h'(\overline{\ell }_k)/w_k,\quad R^{ki}_p=b'(p_i)<0. \end{aligned}$$

For a type\(-k\) child with non-donor parents, utility is

$$\begin{aligned} r^k(\overline{c},\overline{y})=v^k(\overline{c},\overline{y})=\overline{c}-h\Big ({\overline{y}\over w_k}\Big ) \end{aligned}$$
(13)

where

$$\begin{aligned} r^k_{\overline{c}}=1,\quad r^k_{\overline{y}}=-h'(\overline{\ell }_k)/w_k<0. \end{aligned}$$

As there are no income effects on labour supply, the income tax schedule cannot separate the children with inheritances from those without, but can separate high-wage children from low-wage children.

The government maximizes an additive and strictly concave social welfare function in social utilities of parents and in utilities of children, subject to a budget constraint and incentive constraints on non-donor parents, donor parents and children. Below we consider a special case of no social weight on the children’s utility and, therefore, assume the government discounts the children’s utility by \(\alpha \in \{0,1\}\). Specifically, social welfare is:

$$\begin{aligned} {\mathcal {W}}&=\sum _{i=1,2} n_i\Big (d_iW\left( V^i(c_i,y_i,p_i;\delta )\right) +(1-d_i)W\left( v^i(c_i,y_i)\right) \Big )\nonumber \\&\quad + \ n_1\pi d_1 \alpha W\left( R^{11}(\overline{c}_1,\overline{y}_1,p_1)\right) + \Big (n_1\pi (1-d_1)+n_2(1-\pi )(1-d_2)\Big )\nonumber \\&\quad \times \alpha W\left( r^1(\overline{c}_1,\overline{y}_1)\right) +\ n_2\pi d_2\alpha W\left( R^{22}(\overline{c}_2,\overline{y}_2,p_2)\right) \nonumber \\&\quad + \Big (n_2\pi (1-d_2)+n_1(1-\pi )(1-d_1)\Big )\alpha W\left( r^2(\overline{c}_2,\overline{y}_2)\right) \nonumber \\&\quad + n_2(1-\pi )d_2 \alpha W\left( R^{12}(\overline{c}_1,\overline{y}_1,p_2)\right) + n_1(1-\pi )d_1 \alpha W\left( R^{21}(\overline{c}_2,\overline{y}_2,p_1)\right) \end{aligned}$$
(14)

where \(W'(\cdot )>0>W''(\cdot )\).

The government faces an intertemporal budget constraint covering both the parents’ and the children’s generations. This implies that the government can make intergenerational transfer implicitly through the income tax. Therefore, the inheritance tax t, or more accurately the price of bequests \(p_i\), is not needed as an instrument for making transfers between parents and children, allowing us to focus on other roles for the inheritance tax. Assuming for simplicity that the interest rate is zero, the government’s budget constraint is

$$\begin{aligned} \sum _{i=1,2}\left( n_i(y_i-c_i+td_iB_i-\tau _id_iB_i) + \Big (n_i\pi +n_{-i}(1-\pi )\Big )(\overline{y}_i-\overline{c}_i)\right) =G, \end{aligned}$$

where G is the given level of government expenditures. Using \(B_i=b_i/(1-t)\) and \(p_i=(1-\tau _i)/(1-t)\), this can be written as

$$\begin{aligned}&n_1(y_1-c_1)+\Big (n_1\pi +n_2(1-\pi )\Big )(\overline{y}_1-\overline{c}_1)+n_1d_1(p_1-1)b(p_1)\nonumber \\&\quad n_2(y_2-c_2)+\Big (n_2\pi +n_1(1-\pi )\Big )(\overline{y}_2-\overline{c}_2)+n_2d_2(p_2-1)b(p_2)=G. \end{aligned}$$
(15)

The incentive constraints faced by the government hinge on what the government, or its income tax authority, observes. In the case of non-donor parents, the incentive constraint is the standard one:

$$\begin{aligned} c_2- h\Big ({y_2\over w_2}\Big )\geqslant c_1- h\Big ({y_1\over w_2}\Big ). \end{aligned}$$
(16)

The donors’ incentive constraint is more subtle. Donors of a given wage-type will choose the same consumption-income bundle as their non-donor counterparts, given their quasilinear preferences, but their bequests depend on the bequest tax credit they receive. Although we are assuming for simplicity that the income tax structure is not conditioned on bequests, nonetheless the bequest tax credit depends on one’s income. This implies that if a type-2 donor mimics a type-1 donor, the bequest tax credit is \(\tau _1\) rather than \(\tau _2\). Given the separability assumption on the joy-of-giving function, the mimicker would choose to leave \(b(p_1)\). Therefore, in order for a type-2 donor to mimic a type-1, not only would \(c_1\) and \(y_1\) be chosen, but so would the bequest of a type-1 person, \(b(p_1)\). The incentive constraint for donors can then be written as:

$$\begin{aligned} c_2-p_2b(p_2)+f(b(p_2))- h\Big ({y_2\over w_2}\Big )\ \geqslant \ c_1-p_1b(p_1)+f(b(p_1))- h\Big ({y_1\over w_2}\Big ). \end{aligned}$$
(17)

Only one of the two incentive constraints for parents, (16) and (17), will generally be binding in a social optimum. If \(\tau _2>\tau _1\), so \(p_1>p_2\), (17) will be slack when (16) is binding since \(f(b(p_i))-p_ib(p_i)\) is decreasing in \(p_i\).Footnote 6 Alternatively, if \(\tau _1>\tau _2\), so \(p_1<p_2\), (16) will be slack when (17) is binding for the same reason. Only in the unlikely event that \(p_1=p_2\) in the optimum would both constraints on the parents be binding at the same time. In the problems we consider, at least one of the constraints will be binding given the government’s redistributive motive.

The incentive constraint for the children takes the standard form

$$\begin{aligned} \overline{c}_2- h\Big ({\overline{y}_2\over w_2}\Big )\geqslant \overline{c}_1- h\Big ({\overline{y}_1\over w_2}\Big ) \end{aligned}$$
(18)

since the bequest a child receives depends only on the parent’s wage-type and not on any characteristic of the child. This incentive constraint will always be binding.

The government maximizes social welfare in (14) subject to the budget and incentive constraints, (15)–(18). The first-order conditions are listed in the “Appendix”. We first consider the social optimum when the government can observe wage-types of both parents and children. This will provide insight into the optimal effective prices of net bequests and therefore bequest tax credit rates for the two wage-types. We then turn to the imperfect-information case when the government cannot observe wage-types.

4 Full-information benchmark

Suppose the government knows the wage-types of all parents and children and whether a parent is a donor. For comparison purposes, we assume that the government is restricted to using the same policy instruments as in the imperfect-information case, that is, nonlinear income tax systems on parents and children that are not conditioned on donor or donee status, as well as an inheritance tax and bequest tax credits. Given the above discussion, the optimal choices of \(\tau _i\) and t can be subsumed in the optimal choice of the price of net bequests \(p_i\). The outcome will not be first-best since the income tax is not contingent on donor or donee status, but it will serve to clarify the efficiency and equity effects of proportional bequest taxation in a setting with redistributive income taxes.

With observable wage-types, no incentive constraints will be binding. The government freely chooses \(c_i\), \(y_i\), \(\overline{c}_i\), \(\overline{y}_i\) and \(p_i\) for \(i=1,2\) to maximize (14) subject to the budget constraint (15). The first-order conditions in the “Appendix” apply with \(\gamma =\gamma ^d=\phi =0\). The conditions on \(c_i\), \(y_i\), \(\overline{c}_k\) and \(\overline{y}_k\) yield

$$\begin{aligned} \frac{h'(\ell _i)}{w_i}=1, \quad \frac{h'(\overline{\ell }_k)}{w_k}=1,\qquad i,k=1,2 \end{aligned}$$
(19)

which together with (8) imply that the marginal income tax rate for all wage-types is zero and the income tax is non-distorting, as expected. We also obtain results on the equity conditions characterizing optimal redistribution among types and the optimal prices of net bequests.

Consider first the parents. From the first-order conditions on \(c_i\) for both wage-types, we obtain:

$$\begin{aligned} d_1W'(V^1)+(1-d_1)W'(v^1)=d_2W'(V^2)+(1-d_2)W'(v^2)=\lambda . \end{aligned}$$
(20)

The government optimally increases consumption of the type\(-i\) parents until the marginal cost of doing so equals the marginal social benefit as given by (20) for the two wage-types. Consequently, the average marginal social utility of consumption of parents of a given wage-type (averaged over donor and non-donor parents of that wage-type) will be equal for the two wage-types and equal to the shadow value of government revenue. Next, consider the equity condition for the children. From the first-order conditions on \(\overline{c}_k\), we have

$$\begin{aligned} {\lambda }&= \frac{n^{kd}}{n^k}\left[ \left( \frac{n^{kk}}{n^{kd}}\right) W'\left( R^{kk}\right) +\left( 1-\frac{n^{kk}}{n^{kd}}\right) W'\left( R^{ki}\right) \right] +\left( 1-\frac{n^{kd}}{n^k}\right) W'\left( r^k\right) \nonumber \\&\qquad k=1,2, \end{aligned}$$
(21)

where \(n^k=n_k\pi +n_i(1-\pi )\) is the number of type\(-k\) children, \(n^{kd}=n_k\pi d_k+n_i(1-\pi )d_i\) is the number of type\(-k\) children who receive a bequest, and \(n^{kk}=n_k\pi d_k\) is the number of type\(-k\) children with a type\(-k\) donor parent where \(k\ne i =1,2\). Analogous to (20) above, this equity condition requires that the average marginal social utility of consumption be equal for high-wage and low-wage children receiving three different amounts of bequests, \(b(p_1),b(p_2)\) and zero.

Together these equity conditions determine the optimal allocation of consumption across the four types of parents (donor/non-donor and high-/low-wage) and the six types of children (high-/low-wage and high-wage/low-wage/non- donor parent). It is worth highlighting at this point that if all parents were donors (\(d_1=d_2=1\)), then from (20) and (21), there would be full equalization of parents’ social utilities and children’s utilities. With heterogenous bequest behaviour, \(1>d_2>d_1\), this is no longer the case. Further, it follows from (20) that if \(V^i>v^i\), then \(W'(V^i)<\lambda \), and the converse holds.

Consider now the choice of \(p_i\) for \(i=1,2\). The first-order conditions on the \(p_i\)’s with the donor’s incentive constraint not binding can be written for \( i=1,2,\) \(k\ne i\) as follows:

$$\begin{aligned}&- W'\left( V^i\right) \Big (b(p_i)+(1-\delta )p_ib'(p_i)\Big ) +\lambda \Big ((p_i-1) b'(p_i)+b(p_i)\Big ) \nonumber \\&\quad + \alpha \Big (\pi W'\left( R^{ii}\right) +(1-\pi )W'\left( R^{ki}\right) \Big )b'(p_i) =0. \end{aligned}$$
(22)

The effective price of net bequests plays three distinct roles in government policy. The first role is to correct social externalities. One externality arises as a result of the government giving a different weight to donors’ utility of bequests than donors themselves (the ‘laundering out’ of the benefits of bequest), and another arises as a result of the government giving a different weight to children’s utility than donors themselves. The second role is to redistribute between donor and non-donor parents. The final role is to redistribute among children who do and do not receive bequests. Both of these two latter roles arise from differing bequest behaviour. (Recall that the income tax system can do neither of the latter two.) We highlight each of these roles in turn.

4.1 Social externality-correcting role

To focus on this role, assume first that the government only cares about aggregate social utility, that is, \(W'(\cdot )=1\) and \(W''(\cdot )=0\), and therefore has no aversion to social utility inequality. The equity conditions (20) and (21) imply that \(\lambda =1\), and condition (22) reduces to

$$\begin{aligned}{}[\delta p_i+\alpha -1]b'(p_i)=0 \end{aligned}$$
(23)

Starting from the laissez-faire, if the government gives full weight to children’s utility (\(\alpha =1\)), then the government will want to fully subsidize bequests by setting \(p_i=0\) for any \(\delta >0\). This corresponds with the case considered in the recent literature, such as Kaplow (2001), Farhi and Werning (2010) and Piketty and Saez (2013). In this case, only if the government completely discounted the utility of bequests so \(\delta =0\), would the government reduce net bequests to zero.Footnote 7 Discounting the donor benefits of bequests does not change the efficiency rationale for the subsidization of bequest when the government is not averse to utility inequality provided children’s utility is given full weight in the government’s objective.

Using a comparable base case as in Farhi and Werning (2010) where children are passive recipients of bequests and no weight is given to their utility in the government’s objective function (\(\alpha =0\)), then for discount factors \(\delta \) greater than \(1/f'(0)\), optimal bequests are positive and the government optimally taxes bequests by setting \(p_i=1/\delta \ge 1\). For discount factors less than \(1/f'(0)\), optimal bequests are zero. In this special case with \(\alpha =0\), donors are leaving bequests larger than is optimal from the point of view of the government and the government taxes bequests. The more the government discounts the utility of bequests, the smaller is \(\delta \), so the higher the tax rate on bequests to correct this social externality.

Now suppose the government is averse to social utility inequality, so \(W'(\cdot )>0>W''(\cdot )\). Bequest taxation will continue to serve an externality-correcting role, but will also now serve a role in redistributing between donors and non-donors and among children who do and do not receive bequests.

4.2 Redistribution between donors and non-donors

To consider the role of \(p_i\) in distributing between donors and non-donors, begin first with the case where \(\alpha =0\) to suppress redistribution between recipient and non-recipient children for the time being and assume positive bequests in the optimum.Footnote 8 It is useful to consider first whether the bequest tax \(p_i\) should be positive or negative, and then whether \(p_1\) should be greater or less than \(p_2\).

4.2.1 Tax or subsidize bequests?

We can rewrite (22) as

$$\begin{aligned} \frac{W'(V^i)}{\lambda }(1-\delta )+\frac{ 1-{W'(V^i)}/{\lambda }}{\epsilon ^i_p}= \frac{p_i-1}{p_i} . \end{aligned}$$
(24)

Consider first the case when full weight is given to the utility of bequest, \(\delta =1\), so there is no social externality-correcting role for bequest taxation. The first term on the left-hand side is zero. The second term on the left-hand side captures the redistributive role of bequest taxation. When parents differ in their bequest behaviour and with \(\delta =1\), donors will be better off than non-donors from the government’s point of view, \(V^i>v^i\). It follows from the equity condition (20) that \(W'(V^i)<\lambda \). Consequently, the second term on the left-hand side will be positive, which implies that at the optimum \(p_i>1\) and bequests are taxed. The more responsive bequests are to the effective price, i.e. the larger the \(\epsilon ^i_p\), the lower the optimal amount of bequest taxation and the smaller the effective price. This redistribution effect arises solely as a result of heterogeneous bequest behaviour. If all parents are donors (as is generally assumed in the literature), then the social marginal utilities of parents would be identical and equal to the marginal cost of public funds, so this second term would also be zero and the optimal effective price of net bequests would be unity. Here we have shown that it will be optimal to use bequest taxation to redistribute between individuals of differing bequest behaviour.

Next, consider how discounting the utility of bequests affects the role of the effective price in redistribution between non-donor and donor parents. The discount rate \(\delta \) affects the government’s ordering of social utilities between donor and non-donor parents of a given wage-type and ultimately the direction of redistribution. The sign of the second term of (24) depends on this ordering. From the equity condition, (20), the term will be positive (negative) if donors are viewed as better (worse) off than non-donors. In the case where \(V^i<v^i\), the net social benefit of bequest—or net social surplus—is negative, that is \(\delta f(b(p_i))-p_ib(p_i)<0\), and the government wants to redistribute from non-donor to donor parents. By eliminating bequests, the government unambiguously improves the social utility of donor parents and achieves full utility equality. In other words, the sign of the second term of (24) can never be negative in an optimum. Bequests will only be positive in the optimum when the net social surplus from bequests is positive.

More formally, start with the case where \(p_i\) is set such that optimal bequests are zero, and \(V^i=v^i\). Since there are no bequests, government revenue from bequest taxation is zero. Then, suppose the government eliminates the bequest tax, so \(p_i=1\), and government revenues are still zero. If at \(p_i=1\), there is a positive net social surplus from bequests, then donor parents will be strictly better off and non-donor parents are just as well off. This is a Pareto improvement, and the government will want to allow for positive bequests. If, instead, the net social surplus is negative at the laissez-faire, then donor parents will be strictly worse off, and the government would want to ensure zero bequests. Therefore, if the government sufficiently discounts the utility of bequests (i.e. for social weights less than b(1) / f(b(1))), then bequests will optimally be taxed away. For larger social weights, bequests will optimally be positive and donors will be better off than non-donors. The government will tax bequests to correct the social externality and to redistribute from donor to non-donor parents of a given wage-type.

4.2.2 Differential effective prices of net bequests?

Having established that the bequest tax should be positive if bequests are positive in the optimum, it remains to consider how \(p_i\) should differ between type-1 and type-2 donor parents. In this full-information case, the government can redistribute between wage-types using the income tax. However, the income tax does not redistribute between donors and non-donors. Bequest taxation serves that purpose, and the case for imposing differential bequest taxes depends on the relative proportion of donors of each wage-type. To see this, suppose \(p_1=p_2=p>0\). Then both types of donor parents are choosing the same bequest, and it follows from (22) and \(\alpha =0\) that donors of both wage-types are equally well-off (\(V^1=V^2\)). This in turn implies that non-donor parents are also equally well off (\(v^1=v^2\)). For the equity condition (20) to be satisfied, it must be that the share of donors of each wage-type is the same, \(d_2=d_1\). Thus, a uniform bequest tax policy is only optimal if there is the same proportion of donors of each wage-type.

Next, we can show that the government should impose a lower effective price on those wage-types with a larger share of donors. Rewrite the first-order condition on \(p_i\), (22), with \(\alpha =0\) as follows,

$$\begin{aligned} \frac{W'(V^i)}{\lambda }-1= \frac{(\delta p_i-1)b'(p_i)}{b(p_i)+(1-\delta ) p_ib'(p_i)}. \end{aligned}$$
(25)

Given \(V^i>v^i\), \(W'(V^i)<\lambda \) from (20) and the left-hand side of (25) will negative. Assuming \(V^i_p<0\), the denominator of the right-hand side of (25) be positive and since \(b'(p_i)<0\), it must be that \(p_i>1/\delta \). The optimal effective price is greater than the price that just corrects the social externality, because bequest taxation is also being used to redistribute from donor to non-donor parents. This also implies that the right-hand side of (25) is decreasing in \(p_i\).Footnote 9 Now consider the equity condition (20). It can be shown that in the optimum \(W'(V^2)>W'(V^1)\).Footnote 10 It then follows from (25) that \(p_1>p_2\).

To gain some intuition for this result, consider the following thought exercise. Start from the situation in which the government is imposing the same effective price, which implies from (22) that \(V^1=V^2\). Now consider increasing the effective price \(p_i\) to finance an increase in consumption to all type-i parents. The social gain of a marginal increase in \(p_i\) is the additional revenue it generates that benefits all type\(-i\) parents and is given by \(\lambda n_i\). The social cost of a marginal increase in \(p_i\) is given by \(n_id_iW'(V^i)V^i_p\) (still assuming \(\alpha =0\)). Therefore, the increase in \(p_i\) needed to finance a uniform increase in \(c_i\) is given by \(-d_iW'(V^i)V^i_p/\lambda \). Given that \(d_2>d_1\) this expression will be larger for \(p_2\) than for \(p_1\), starting from \(p_1=p_2\). The larger the share of donor parents, the higher the social loss per unit of revenue gained, and therefore, the government will optimally set \(p_1>p_2\).Footnote 11

The upshot is by restricting attention to the roles of bequest taxation to correct for the discounting of the utility of bequest and to redistribute between donor and non-donor parents, we obtain \(p_1>p_2>1\) in the full-information optimum. Let us now extend the analysis to include the effect of \(p_i\) on the children.

4.3 Redistribution between recipient and non-recipient children

4.3.1 Tax or subsidize bequests?

Now let \(\alpha =1\) so the utility of children count. Relative to the full-information case with \(\alpha =0\), there is an additional cost of increasing \(p_i\) since the bequests now benefit the children. This is reflected in the last term in (22), which is negative since \(b'(p_i)<0\). There is now a positive externality of bequests as a result of having a positive social weight on children’s utility. Donors do not take account of the benefit of their bequests to their children, apart from their own benefit \(f(b(p_i))\). This gives the government an incentive to subsidize bequests as discussed above and previously shown in the literature. Consequently, the effective price of net bequests may be greater or less than unity. The effective price of net bequest will always be positive though given the government is averse to social utility inequality, that is, bequests will not be fully subsidized. The revenue gain of reducing the bequest tax credit slightly staring from \(\tau _i=1\) will always be greater than the social loss to donor parents and children receiving bequests and reflects the government’s incentive to redistribute from donor to non-donor parents and inheriting children to non-inheriting children.Footnote 12

4.3.2 Differential effective prices of net bequests?

Consider now the pattern of net bequest prices, \(p_1\) and \(p_2\). Since skills are positively correlated between parents and children (\(\pi >1/2\)) and bequests are correlated with parental skills (\(d_2>d_1\)), a higher proportion of high-skilled children receive bequests than low-skilled children. This favours giving a higher bequest tax credit to low-skilled donors to encourage them to give bequests. This desire to set \(p_1<p_2\) in order to favour low-skilled inheritors counters the opposite desire to redistribute between non-donor and donor parents, and either influence can dominate. To see this more formally, suppose we evaluate (22) at \(\alpha =0\). As shown above, the first two terms in (22) will be equal to zero at \(p_1>p_2\). Now consider the last terms in (22). With \(p_1>p_2\), low-wage donor parents are leaving smaller bequests than high-wage donors. Therefore, the average social marginal utility of high-wage donor heirs must be lower than that of low-wage donor heirs. Thus, starting from the optimum with \(\alpha =0\), the government will want to reduce both \(p_1\) and \(p_2\), but will have an incentive to reduce \(p_1\) by a greater amount so as to increase the bequest received by low-wage donor heirs.Footnote 13

To see that the relative sizes of \(p_1\) and \(p_2\) are ambiguous with \(\alpha =1\), rewrite (22) as

$$\begin{aligned} \frac{W'(V^i)}{\lambda }-1=\frac{\Big (\delta p_i-1+ (1/\lambda )\big (\pi W'\left( R^{ii}\right) +(1-\pi )W'\left( R^{ki}\right) \big )\Big )b'(p_i)}{b(p_i)+(1-\delta ) p_ib'(p_i)}. \end{aligned}$$
(26)

Suppose the social utility of donors is higher than the social utility of non-donors. It follows that the left-hand side of the above expression is negative which means given \(V^i_p<0\) that the denominator on the right-hand side is positive and the expression in the brackets in the numerator must be positive. Therefore, the right-hand side expression can again be shown to be decreasing in p. So if \(p_1=p_2\), then \(V^1=V^2\). Further if \(V^2>V^1\), then \(W'(V^2)<W'(V^1)\) and \(p_2>p_1\) or vice versa. If donors are better off than non-donors, we can again use the equity condition on parents to show \(V^2<V^1\) and thus \(p_2<p_1\). With a positive weight on children’s well-being in the social welfare function, it may be that the social utility of donors is less than the social utility of the non-donors. The social benefit to heirs of bequests may outweigh the social loss to parents if \(\delta \) is low enough. In this case, there may be positive bequests even though the social utility of donors is lower than the social utility of non-donor parents. With positive bequests, the equity condition implies that \(V^2>V^1\) given \(d_2>d_1\). How the right-hand side of (26) changes with p is now ambiguous, and therefore, \(p_1\) may be greater or less than \(p_2\).

Summary of Results in the Full-Information Benchmark

When the government can observe the wage-types of parents and children and impose nonlinear income taxes on each that are not conditioned on donor status along with a less than 100 % inheritance tax and a wage-type-specific bequest tax credit, the following results apply.

  1. 1.

    The optimal solution determines the effective price of net bequests, \(p_i=(1-\tau _i)/(1-t)\) for each type\(-i\) person, so the absolute size of the inheritance tax and the bequest tax credit are indeterminate.

  2. 2.

    The effective price of net bequests plays three distinct roles: it corrects for a social externality when the social weight given to either the utility of bequests or children’s utility differs from the donors’ utility; it redistributes between donor and non-donor parents and between recipient and non-recipient children.

  3. 3.

    In an optimum with positive bequests, if no social welfare weight is given to children (\(\alpha =0\)),

    1. (a)

      the net social surplus from bequests will be non-negative so that donors will be at least as well off (from the government’s point of view) as non-donor parents of a given wage-type, and

    2. (b)

      the optimal value of \(p_i\) will be greater than one and smaller for the wage-type with the highest proportion of donors.

  4. 4.

    Giving children full social weight favours making \(p_2>p_1\), so given that \(d_2>d_1\) the relative size of \(p_1\) and \(p_2\) will be ambiguous; the optimal value of \(p_i\) will be positive, but can be greater or less than unity.

5 Imperfect information

Now suppose that the government cannot observe wage-types. Incentive constraints for non-donor and donor parents, (16) and (17), and for children, (18), now apply. These ensure that high-wage parents, regardless of donor status, are better off (from the government’s perspective) than their low-wage counterparts, and similarly for high-wage children.Footnote 14 The government cannot implement lump-sum redistribution between wage-types, so bequest taxation is used to redistribute not only between donor and non-donor parents and inheritor and non-inheritor children, but also between wage-types.

The first-order conditions on \(c_i\), \(y_i\), \(\overline{c}_i\) and \(\overline{y}_i\) in the Appendix 1 characterize the optimal income tax structure. These results are standard: \(h'(\ell _2)/w_2=h'(\overline{\ell }_2)/w_2=1\), \(h'(\ell _1)/w_1<1, h'(\overline{\ell }_1)/w_1<1\), which imply zero marginal tax rates at the top and positive ones at the bottom.

From the first-order conditions on \(c_1\) and \(c_2\), we obtain a standard equity condition that the weighted average of the marginal social utility of consumption for all parents (donors and non-donors) is equal to the marginal cost of raising an additional unit of tax revenue \(\lambda \), where the weights are given by the population share of each of the four types of parents (high or low wage and donor or non-donor) and sum to unity since \(n_1+n_2=1\):

$$\begin{aligned} n_1d_1W'(V^1)+n_1(1-d_1)W'(v^1)+n_2d_2W'(V^2)+n_2(1-d_2)W'(v^2)=\lambda . \end{aligned}$$
(27)

Similarly, from the first-order conditions on \(\overline{c}_1\) and \(\overline{c}_2\), we obtain:

$$\begin{aligned}&{\lambda } = n^{11} W'\left( R^{11}\right) +(n^{1d}-n^{11}) W'\left( R^{12}\right) + (n^1-n^{1d}) W'\left( r^1\right) \nonumber \\&\qquad + n^{22}W'\left( R^{22}\right) + (n^{2d}-n^{22}) W'\left( R^{21}\right) + (n^2-n^{2d}) W'\left( r^2\right) \end{aligned}$$
(28)

with the weights also summing to unity. Note the differences between (27) and (20) and between (28) and (21). With full information, the government is able to equate the average social marginal utility of consumption of the high- and low-wage-types using the nonlinear income tax, leaving bequest taxation as an admittedly blunt device to redistribute from non-donors to donors and recipients and non-recipients and to correct the social externality. With imperfect information, even that targeting of policies is no longer possible. The government is constrained in redistributing between wage-types, and that has implications for the role of bequest taxation given heterogenous bequest behaviour.

The first-order conditions on \(p_1\) and \(p_2\) in the Appendix 1 determine the optimal prices of net bequests. These differ from (22) in the full-information benchmark by the terms involving the donor incentive constraints. Recall from above that either the non-donor’s or the donor’s incentive constraint, (16) or (17), will be binding depending on whether or not \(p_1<p_2\) in the optimum. As in the full-information case, if the government is not averse to social utility inequality (\(W'=1\)) or if all parents are donors \((d_1=d_2=1\)) then there is no rationale for bequest taxation other than to correct the social externalities. Under either of these scenarios, the equity conditions (27) and (28) together with the first-order conditions on the net effective prices imply that there will be a uniform effective price on net bequest given by (23) as in the full-information benchmark. Differential bequest taxation will be optimal only when there is heterogeneity of bequest behaviour and some aversion to social inequality. Unlike the full-information case where the incentive constraints did not apply and the effective price of bequests is used to correct the social externality and to redistribute between donor and non-donor parents and between recipient and non-recipient children, they also now influence redistribution between wage-types.

To explore this, it is useful to begin with the case where \(\alpha =0\) to suppress the use of \(p_i\) to redistribute between recipient and non-recipient children. We can show in this case that the incentive constraint on donors must be binding at the optimum and therefore \(p_1<p_2\) when there is imperfect information. To see this, suppose the incentive constraint on donors (17) is slack and therefore \(p_1>p_2\). The first-order conditions on the effective prices are the same as in the full-information case (22). Only the income tax rates are distorted. With positive net social surplus from bequests, donor parents will be better off than non-donor parents of a given wage-type and it follows from (22) that \(p_1<p_2\), a contradiction.Footnote 15 Therefore, the incentive constraint on the donors must be binding in the imperfect-information case and \(p_1<p_2\). Bequest taxation helps to redistribute between parents of differing wage-types when parents differ in their bequest behaviour.

Suppose now that \(\alpha =1\) so that the bequest tax must take into account redistribution between recipient and non-recipient children. From the binding incentive constraint on the children, we know that \(r^2>r^1\) and consequently, \(R^{21}>R^{11}\) and \(R^{22}>R^{12}\). The average social marginal utility of heirs receiving a bequest from a low-wage donor parent is greater than the average social marginal utility of heirs receiving a bequest from a high-wage donor. The relative magnitude of the effective prices on net bequest continues to depend on whether the incentive constraint on the donors is binding or not. In the former case, it must be that \(p_1<p_2\) and in the latter case, \(p_1>p_2\). This continues to hold.

Intuitively, bequest taxation now has to take account of several sources of inequality. There is the inequality between the donor and non-donor parents and between the recipient and non-recipient children as in the full-information case. The former calls for \(p_1>p_2\) and the latter the opposite. Then, there is inequality between the high- and low-skilled for both parents and children arising because of imperfect information and the incentive constraints in the nonlinear income tax systems. Both of these call for \(p_1<p_2\): this favours the type-1 parents relative to the type-2’s directly and also favours the type 1 children relative to the type 2’s indirectly by reducing inheritances more on average for the latter. On balance, it is ambiguous whether \(p_2\) should be higher or lower than \(p_1\), but relative to the full-information case, \(p_1\) would tend to be reduced relative to \(p_2\).

Summary of Results with Imperfect Information

When the government cannot observe the wage-types of either parents or children and impose nonlinear income taxes both that are not conditioned on donor or inheritor status along with a less than 100 % inheritance tax and a wage-type-specific bequest tax credit, the following results apply.

  1. 1.

    The optimal solution determines the effective price of net bequests, \(p_i=(1-\tau _i)/(1-t)\) for each type\(-i\) person, so both the absolute size of the inheritance tax and the bequest tax credit are indeterminate.

  2. 2.

    If all parents are donors (\(d_1=d_2=1\)), bequests will be fully subsidized, provided there is a positive social weight on the utility of bequests and the welfare of children count. If \(\alpha =0\), then a uniform bequest tax that is decreasing in the social weight \(\delta \) will be optimal and the bequest tax will be zero for sufficiently small social weights.

  3. 3.

    With heterogenous bequest behaviour, bequest taxation also serves to redistribute between parents and children of differing wage-types.

    1. (a)

      If \(\alpha =0\), the optimal value of \(p_i\) will be higher for the high-wage donor parents.

    2. (b)

      If the welfare of children count, the relative size of \(p_1\) and \(p_2\) are ambiguous.

6 Extensions

6.1 Inheritance taxes conditional on child’s wage-type

The preceding analysis shows that the absolute value of an inheritance tax which applies to all children, regardless of wage-type, is indeterminate, given that the government can differentiate the bequest tax credit by wage-type of donor parent. Only the effective price of net bequests conditioned on donor-type, that is \(p_i\), is determined at the optimal solution. How does this result change if the inheritance tax can be conditioned on the child’s wage-type?

The government will now have four policy instruments, a bequest tax credit applying to each type of donor parent and an inheritance tax applying to each type of donee child. Assuming that donor parents know the child’s wage-type when making their bequest, household behaviour and all budget constraints will again depend only on the net price of bequest, which for a donor of type-i with a child of type-k is now given by \(p_{ik}=(1-\tau _i)/(1-t_k)\).Footnote 16 There will be four different effective prices of bequests, but like the previous case in which the uniform inheritance tax is redundant, these four prices cannot be chosen independently. In other words, given that household behaviour depends on the effective price which is a ratio of two of the four policy instruments, one of the policy instrument is redundant. The government’s optimal choice of three of the policy instruments will ensure that the fourth instrument is also optimal. To see this, note that the effective prices of net bequests are related as follows:

$$\begin{aligned} p_{11}p_{22}={1-\tau _1\over 1-t_1}\cdot {1-\tau _2\over 1-t_2}=p_{12}p_{21} \end{aligned}$$

so only three of the prices can be chosen independently. This holds regardless of the assumed information of the government. Allowing the government to differentiate the inheritance tax by the wage-type of the receiving child gives it only one more degree of freedom relative to the case of a single inheritance tax.

What can we say about the ranking of the effective prices when the inheritance tax can be conditioned on child type? Consider the full-information case where the government knows the wage-types of both the parents and the children. Suppose we start in the outcome considered above in which \(t_1=t_2=t\) and \(p_1\) and \(p_2\) are chosen optimally. Applying the Envelope theorem to the value function for that problem, we show in the Appendix 2 that social welfare will increase with an incremental increase in \(t_2\) (holding \(t_1\) constant) if \(r^2>r^1\), and vice versa.

To determine the \(r^1\) relative to \(r^2\), rewrite the equity condition for the children, (21), as:

$$\begin{aligned} \lambda =\frac{n^{1d}}{n^1}\overline{W}'\left( R^{1}\right) +\left( 1-\frac{n^{1d}}{n^1}\right) W'\left( r^1\right) =\frac{n^{2d}}{n^2}\overline{W}'\left( R^{2}\right) +\left( 1-\frac{n^{2d}}{n^2}\right) W'\left( r^2\right) \end{aligned}$$
(29)

where \(\overline{W}'\left( R^{k}\right) \) is the average marginal utility of type\(-k\) recipients. We know that \(\overline{W}'\left( R^{k}\right) <W'\left( r^k\right) \) for \(k=1,2\), but the relative size of \(\overline{W}'\left( R^{1}\right) \) and \(\overline{W}'\left( R^{2}\right) \) is ambiguous. However, as we show in the Appendix 3, \(\overline{W}'\left( R^{1}\right) <\overline{W}'\left( R^{2}\right) \) if \(\pi =1\) and \(\overline{W}'\left( R^{1}\right) =\overline{W}'\left( R^{2}\right) \) if \(\pi =1/2\). So, we expect that \(\overline{W}'\left( R^{1}\right) <\overline{W}'\left( R^{2}\right) \) for \(\pi >1/2\). Given that, the following result is apparent from (37) and (38). Given that \(\overline{W}'\left( R^{1}\right) \leqslant \overline{W}'\left( R^{2}\right) \), \(W'\left( r^1\right) <W'\left( r^2\right) \), so \(r^1>r^2\). This implies from above that \(t_2\) should be reduced relative to \(t_1\). Given that \(r^1>r^2\), we have \(R^{11}>R^{21}\) and \(R^{12}>R^{22}\). Therefore, by increasing \(t_1\) relative to \(t_2\), the bequests received by children with parents of opposite skill type increases and inequality between recipient children of the same donor parent is reduced. This ambiguity is not resolved under imperfect information. The government is constrained in redistributing from type-2 to type-1 parents and children, and this will tend to reduce \(\tau _2\) relative to \(\tau _1\) and increase \(t_2\) relative to \(t_1\).

6.2 Income effects of bequest taxation

Up to now, we have ignored any income effects on the bequest decision. Suppose instead preferences of donor and non-donor parents are given by

$$\begin{aligned} U(x,b,\ell )=\nu (x)+ f(b)-h(\ell ),\quad u(x,\ell )=\nu (x)-h(\ell ), \end{aligned}$$
(30)

respectively, where \(\nu (x)\) is a strictly concave function.Footnote 17 To relate this discussion to our previous analysis, we continue to assume that the government uses both a linear inheritance tax and a bequest tax credit so the relevant policy variable is the effective price of bequests and that the income tax cannot be conditioned on the level of bequests. But for simplicity, assume that the effective price of bequests is the same for all.Footnote 18 As before, it is useful to consider the parent’s problem in two stages. In the second stage, a parent decides how to allocate a given after-tax income c between private consumption x and bequests b given \(x+pb=c\). Non-donor parents consume their after-tax income, so \(x=c\). Donor parents choose b such that \(f'(b)=p\nu '(c-pb)\), which yields optimal bequests b(pc). Bequests will be decreasing in the effective price of net bequests and increasing in after-tax income.Footnote 19

In the first stage, parents decide how much income to earn which for a given income tax system yields both income y and after-tax or disposable income c. The marginal rate of substitution between disposable income and earned income for donor and non-donor parents is given by

$$\begin{aligned} \mathrm{MRS}_{cy}= \frac{h'(y/w_i)/w_i}{\nu '(c-pb(p,c))},\quad \mathrm{MRS}_{cy}= \frac{h'(y/w_i)/w_i}{\nu '(c)},\quad i=1,2, \end{aligned}$$
(31)

respectively. For a given \((c,y)-\)bundle, donor parents will be better off and will have a lower marginal rate of substitution between disposable income and earned income than non-donor parents of a given wage-type, provided bequests are positive. In other words, the indifference curves of donor and non-donor parents of a given wage-type will exhibit a single-crossing property in \((c,y)-\)space (with positive bequests). These indifference curves coincide when there are no income effects on bequests. We also obtain the standard single-crossing property between high- and low-wage parents of a given donor status.

Consider the laissez-faire when all four types of parents set \(MRS_{cy}=1\) and \(p=1\). As donor parents leave positive bequests when \(p=1\),Footnote 20 they will choose to work more than non-donor parents of a given wage-type. We also obtain the standard result that for a given donor status, high-wage parents earn more income than low-wage parents. Consequently, high-wage donor parents will leave greater bequests than low-wage donor parents in the laissez-faire. In this case, we do not need to assume that there is a greater share of high-wage donor parents to generate the result that the average bequest for high-wage parents is greater than the average bequest for low-wage parents. Assuming that preferences of children are the same as non-donor parents as before, we can also conclude that inheriting children will work less than non-inheriting children of the same wage-type.Footnote 21 These results can be contrasted to the case when preferences are quasilinear in consumption where donor parents of both wage-types leave the same bequests and labour supply is independent of both donor and inheritance status.

Consider now the case for taxing bequests. Having income effects does not change the government’s incentive to tax bequests when it discounts the utility of bequests to donors. There is still a social externality in this case since the government puts less weight on bequests than donors do so would like to influence donors’ behaviour. At the same time, the government has an incentive to subsidize bequests when the welfare of children is given full weight in the government’s objective.

Things become complicated when considering the redistributive role for bequest taxation. Focus on the imperfect-information case. Although we assume donor status is not observable, the government can use the income tax schedule to separate donor from non-donor parents by choosing separate consumption-income bundles based on donor status subject to various incentive constraints. In addition to the incentive constraints between donor parents and non-donor parents of differing wage-types, there will now be incentive constraints between donor parents of one wage-type and non-donor parents of the other wage-type. Which incentive constraints are binding will depend on the underlying individual preferences, wage distribution and government policy. For illustrative purpose, assume that the \(MRS_{cy}\)’s of the four types of parents for a given \((c,y)-\)bundle ascend from the lowest to the highest in the following order: (1) high-wage donor parents, (2) low-wage donor parents, (3) high-wage non-donor parents and (4) low-wage non-donor parents. Assume also that in the laissez-faire the ranking of parents’ social utility descends in the same order: (1) highest, then (2), etc. In this case, when an optimal income tax is imposed, the incentive constraints will be downward binding—(1) will be binding on (2), (2) on (3) and (3) on (4).Footnote 22 The question then is whether a bequest tax can improve upon the optimal nonlinear income tax schedule.

By the Atkinson–Stiglitz theorem, a bequest tax cannot improve redistributive outcomes between (1) high-wage donor and (2) low-wage donor parents. Nor can it help between (3) high-wage non-donor and (4) low-wage non-donor parents. At best, it can improve redistribution between (2) low-wage donor and (3) high-wage non-donor parents. We can show that a positive bequest tax will be useful for this purpose given the assumed social ranking. To see this, suppose we start from an optimal nonlinear income tax system with no bequest tax (\(p=1\)). Consider the introduction of a small proportional bequest tax and an adjustment to the optimal nonlinear income tax schedule to keep utilities of the four types constant. The high-wage non-donor parents do not leave a bequest so they are unaffected by this tax and their nonlinear income tax schedule does not need to be adjusted. The low-wage donor parents leave a bequest whether or not they mimic. If they mimic, they earn the income of the high-wage non-donor parent and receive the same net income. Consequently, they leave a bequest conditional on the net income of the individual they are mimicking. Their nonlinear income tax (assuming they do not mimic) can be adjusted so as to compensate them for the introduction of the bequest tax, but since they pay the bequest tax whether or not they mimic they will now be worse off if they mimic. In this case, the introduction of a small proportional bequest tax could weaken the incentive constraint between the low-wage, donor parent and the high-wage non-donor parent and social welfare can be improved. By a similar argument, if the ranking of the parents’ social utility between (2) and (3) was reversed, the incentive constraint would bind in the other direction and a small proportional bequest subsidy could be a useful policy alongside the optimal nonlinear income tax system. The ability of the income tax to differentiate between donors and non-donors of the same wage-type because of income effects reduces the need to use the bequest tax for that purpose. Bequest taxation can, however, still be used to relax the self-selection constraint between parents who differ in both wage and donor status even with weak separability.

Bequest taxation in this environment will also be useful to redistribute between inheriting and non-inheriting children. With a positive correlation between the wage-types of children and parents and since high-wage donors given greater bequests than low-wage donor parents, there will be a positive association between a child’s wage-type and the inheritance they received. Consequently, as argued by Brunner and Pech (2012a, b), bequest taxation can be used to reduce inequities within the children’s generation. Overall, whether the optimal bequest tax is positive or negative depends on the relative importance of the social externality effect, redistribution between inheriting and non-inheriting children, and improving redistribution between heterogeneous parents.

7 Concluding comments

We have studied the optimal linear tax treatment of bequests when the government does not give full welfare weight to the donors’ utility of bequests. There are many cogent reasons for taking this normative perspective. Perhaps the most persuasive one is that voluntary transfers to one’s heirs are in principle analogous to government redistribution based on the altruistic preferences of well-off taxpayers, and there is apparently no support for counting the benefits to the taxpayers from such redistribution. The policy consequences of discounting the benefits of bequests are significant.

With zero welfare weight on children, the social utility of donors will always be higher than the social utility of non-donors. Otherwise, the net social surplus from bequests will be negative and the government will drive net bequests to zero. Bequest taxation plays two distinct roles. It attempts to correct for the social externality arising both from the government weighting the utility of bequest differently from donors and from donors not taking into account the benefit of bequests to inheriting children. At the same time, bequest taxation is used to redistribute between donor and non-donor parents and between inheriting and non-inheriting children. Thus, with zero welfare weight on children, taxing bequests will be both efficiency and equity-enhancing. With positive welfare weight on children, there will be an equity-efficiency trade-off.

We have explored this in a simple setting in which there are both donor and non-donor parents, and therefore both donee and non-donee children. Our results with high social discounting of the joy-of-giving are in sharp contrast to recent analyses that double-count the social welfare benefits of bequests, once to the donors and a second time to the donees (Kaplow 1988; Farhi and Werning 2010; Brunner and Pech 2012a, b; Piketty and Saez 2013). In this approach, the externality of bequests leads to an argument for subsidizing them, tempered by the desire to redistribution among donnees in different circumstances.

The framework of our analysis is restrictive, although similar to recent literature on bequest taxation. By restricting bequest policies to linear instruments, we are able to uncover the various factors that are relevant for policy. It is clear that compensating donors for the costs of their voluntary transfers leads to complicated policy prescriptions even in a simple setting. In that sense, our analysis is exploratory. Ideally one might want to allow the government to implement nonlinear inheritance taxation alongside nonlinear income taxation, and that would be a useful next step.