1 Introduction

This paper introduces a parametric family of Lorenz curves obtained by a general method, based on adding a parameter \(-\infty <\alpha <\infty \), \(\alpha \ne 0\) to an initial Lorenz curve \(L_0(p),\,0\le p\le 1\), using the arctan function. The particular case obtained when \(\alpha \) tends to zero is reduced to the initial Lorenz curve \(L_0(p)\).

The development of new functional forms of Lorenz curves has been an attractive area of research in recent decades; see, for example, Kakwani (1980), Aggarwal and Singh (1984), Gupta (1984), Ortega et al. (1991), Basmann et al. (1990), Chotikapanich (1993), Ogwang and Rao (1996), Sarabia et al. (1999, 2010). For a recent review of Lorenz curves and income distributions, see Chotikapanich (2008). These methods also provide new functional forms of Leimkuhler curves, which are interesting in terms of informetrics and in particular regarding concentration aspects in this field (see Burrell 1992, 2005; Sarabia and Sarabia 2008; Sarabia et al. 2010, among others).

The densities and distribution functions corresponding to the new Lorenz curves and the corresponding Gini and Pietra inequality indices are shown in closed forms for some particular cases. A method based on the use of the inverse of the initial Lorenz curve is given, which facilitates the computation of the Gini index with the family proposed here.

In this study, we use two data sets (1977 and 1990) from the US Current Population Survey (CPS), considered in Ryu and Slottje (1996), and compare the results with those of the initial Lorenz curves examined.

The structure of this paper is as follows. In Sect. 2, we describe the new family of \(\arctan \) Lorenz curves and the corresponding Leimkuhler curves. Some particular cases obtained by starting with an initial Lorenz curve \(L_0(p)\) are shown. In Sect. 3, the Gini and Pietra indices are obtained, together with the population functions for some cases. In Sect. 4, we compare the performance of the proposed Lorenz curves with that of the initial ones by fitting them to the two data sets, and finally, in Sect. 5, our main conclusions are presented.

2 The new family of Lorenz curves

This section begins with the definition of the Lorenz curve provided by Gastwirth (1971) in accordance with the original proposal by Pietra (1915). Thus:

Definition 1

Given a distribution function F(x) with support in the subset of the positive real numbers and with finite expectation \(\mu \), we define a Lorenz curve as

$$\begin{aligned} L_F(p)=\frac{1}{\mu }\int _0^p F^{-1}(x)\mathrm{d}x,\quad 0\le p\le 1, \end{aligned}$$
(1)

where \(F^{-1}(x)=\sup \left\{ y:F(y)\le x \right\} \).

A characterization of the Lorenz curve, which is well known in the literature, is given by the following result.

Theorem 1

Assume that L(p) is defined and continuous in the interval [0, 1] with second derivative \(L^{\prime \prime }(p)\). The function L(p) is a Lorenz curve if and only if

$$\begin{aligned} L(0) = 0,\quad L(1) = 1,\quad L^{\prime }(0^{+})\ge 0\quad \mathrm{for}\quad p\in (0,1),\quad L^{\prime \prime }(p)\ge 0. \end{aligned}$$
(2)

The main result of this paper is expressed in the following theorem.

Theorem 2

Let \(L_0(p)\) be a Lorenz curve, \(-\infty <\alpha <\infty \), \(\alpha \ne 0\), a real parameter and consider the transformation

$$\begin{aligned} L_{\alpha }(p)=1-\frac{\arctan \left( \alpha (1-L_0(p))\right) }{\arctan \alpha },\quad 0\le p\le 1. \end{aligned}$$
(3)

Then, \(L_{\alpha }(p)\) is also a Lorenz curve.

Proof

Simple algebra provides that \(L_{\alpha }(0)=0,\;L_{\alpha }(1)=1\),

$$\begin{aligned} L^{\prime }_{\alpha }(p)= & {} \frac{\alpha }{\arctan \alpha }\frac{L^{\prime }_0(p)}{1+(\alpha (1-L_0(p)))^2}>0,\\ L^{\prime \prime }_{\alpha }(p)= & {} \frac{1}{1+(\alpha (1-L_0(p)))^2}\left[ \frac{\alpha L^{\prime \prime }_0(p)}{\arctan \alpha }+2\alpha ^2 L_0^{\prime }(p)L_{\alpha }^{\prime }(p)(1-L_0(p))\right] >0, \end{aligned}$$

and \(L_{\alpha }(p)<p\). Then, if \(L_0(p)\) is a genuine Lorenz curve, expression (3) possesses the proper convexity and slope constraints for us to assure that it always lies in the lower triangle of the unit square, and therefore, \(L_{\alpha }(p)\) represents a genuine Lorenz curve. \(\square \)

Using the well-known result that establishes that

$$\begin{aligned} \arctan u-\arctan v=\arctan \left( \frac{u-v}{1+uv}\right) \end{aligned}$$

(3) can be rewritten in a more compact form as

$$\begin{aligned} L_{\alpha }(p)=\frac{1}{\arctan \alpha }\arctan \left( \frac{\alpha L_0(p)}{1+\alpha ^2(1-L_0(p))}\right) . \end{aligned}$$
(4)

By taking in (3) or alternatively in (4) the limit when the parameter \(\alpha \) tends to zero and applying L’Hospital’s rule, it is straightforward to derive that the initial Lorenz curve \(L_0(p)\) is obtained as a special case, i.e., \(L_{\alpha }(p)\rightarrow L_0(p)\) when \(\alpha \rightarrow 0\). Thus, the methodology proposed here can be considered as a mechanism for adding a parameter to an initial Lorenz curve and therefore a means of obtaining a more flexible Lorenz curve.

Other ways to write \(L_{\alpha }(p)\) given in (4) can be obtained by using the following representation of the \(\arctan \) function (see Castellanos 1988):

$$\begin{aligned} \arctan z = \frac{z}{1+z^2}\,_2F_1\left( 1,1;\frac{3}{2},\frac{z^2}{1+z^2}\right) = \sum _{n=0}^{\infty }\frac{2^{2n} (n!)^2}{(2n+1)!}\frac{z^{2n+1}}{(1+z^2)^{n+1}}. \end{aligned}$$
(5)

Here \(_2F_1(a,b;c,z)\) represents the hypergeometric function which has the integral representation

$$\begin{aligned} _2F_1(a,b;c,z)=\frac{\varGamma (c)}{\varGamma (b)\varGamma (c-b)}\int _0^1 t^{b-1}(1-t)^{c-b-1}(1-tz)^{-a}\,\mathrm{d}t, \end{aligned}$$
(6)

and where \(\varGamma (\cdot )\) is the Euler gamma function.

Approximations to the arctan function can be obtained using second- and third-order polynomials and simple rational functions (see Rajan et al. 2006 for details), and it is thus obtained that \(\arctan \left( (1+x)/(1-x)\right) \approx \pi (x+1)/4\). Applying this to (3) and after some algebra, we have

$$\begin{aligned} L_{\alpha }(p)\approx \frac{L_0(p)}{1+\alpha (1-L_0(p))},\quad \alpha >0. \end{aligned}$$
(7)

Observe that the right-hand side in (7) is a genuine Lorenz curve and coincides with expression (27) in Sarabia et al. (2010). Additionally, the Aggarwal and Singh Lorenz curve (see Aggarwal and Singh 1984; Arnold 1986) is obtained from (7) when \(L_0(p)=p\). The mechanism proposed here is more general that the one proposed in Sarabia et al. (2010).

Expression (7) can also be obtained by considering the ordered sequence of Lorenz curves given by

$$\begin{aligned} L_0(p)\ge L_0(p)^2\ge \dots \ge L_0(p)^n\ge \dots \end{aligned}$$
(8)

where n is an integer. It is possible to build a new family of Lorenz curves beginning from (8), but now assuming that the powers \(\{1, 2, \dots , n,\dots \}\) are not fixed, and are distributed according to a convenient discrete random variable with probability mass function \(P_j = Pr(X = j), j = 1, 2, \dots \). In the particular case that \(P_j=1/(1+\alpha )\left( \alpha /(1+\alpha )\right) ^{n-1}\), \(\alpha >0\), i.e., the geometric distribution, the family of Lorenz curves gives (7).

It is known that the Lorenz curve determines the distribution of X up to a scale factor transformation, since \(F^{-1}(x)=\mu L^{\prime }(x)\). Moreover, the relation

$$\begin{aligned} K_0(p)=1-L_0(1-p) \end{aligned}$$
(9)

determines the relationship between the Lorenz and the Leimkuhler curves (see Sarabia and Sarabia 2008 and Sarabia et al. 2010, among others). This curve plays an important role in informetrics (see, for instance, Burrell 1992, 2005). Therefore, from (3) and (9), we can also define a family of \(\arctan \) Leimkuhler curves starting from an initial Lorenz curve \(L_0(p)\), given by

$$\begin{aligned} K_{\alpha }(p)=\frac{\arctan (\alpha (1-L_0(1-p)))}{\arctan \alpha },\quad -\infty <\alpha <\infty ,\, \alpha \ne 0. \end{aligned}$$

2.1 Lorenz ordering

Lorenz ordering is an important aspect in the analysis of income and wealth distributions. If we define L to be the class of all nonnegative random variables with positive finite expectation, the Lorenz partial order \(\le _L\) on the class L is defined by

$$\begin{aligned} X\le _L Y \Longleftrightarrow L_X(p)\ge L_Y(p),\quad \forall p\in [0,1]. \end{aligned}$$

If \(X\le _L Y\), then X exhibits less inequality than Y in the Lorenz sense. In the next result, we show that family (3) is ordered with respect to parameter \(\alpha \).

Proposition 1

The Lorenz curve \(L_{\alpha }(p)\) is ordered with respect to \(\alpha \), i.e., if \(|\alpha _1|\le |\alpha _2|\), \(-\infty <\alpha _1,\alpha _2<\infty \), \(\alpha _1,\alpha _2\ne 0\), then \(L_{|\alpha _1|}(p)\ge L_{|\alpha _2|}(p)\), for \(0\le p\le 1\).

Proof

After computing the derivative of the logarithm of (3), then the sign of \(\mathrm{d}L_{\alpha }(p)/\mathrm{d}\alpha \) depends on the sign of

$$\begin{aligned} \varPhi _{\alpha }(p)= & {} -\left[ 1-L_{\alpha }(p)\right] \left\{ \left[ 1-L_{0}(p)\right] \left( 1+\alpha ^2\right) \arctan \alpha \right. \\&- \left. \left[ 1+\alpha ^2(1-L_{0}(p))^2\right] \arctan \left( \alpha \left( 1-L_{0}(p)\right) \right) \right\} . \end{aligned}$$

Now, using the following inequalities

$$\begin{aligned} \left( 1+\alpha ^2\right) \left( 1-L_0(p)\right)> & {} \left[ 1+\alpha ^2\left( 1-L_0(p)\right) \right] \left[ 1-L_0(p)\right] ,\\ \arctan \alpha> & {} \arctan \left( \alpha \left( 1-L_0(p)\right) \right) , \end{aligned}$$

it is simple to see that \(\varPhi _{\alpha }(p)<0\).

Hence, the result. \(\square \)

The following result sustains that the equality is obtained, i.e., X exhibits the same inequality as Y, when \(\alpha _1=-\alpha _2\).

Proposition 2

It is verified that \(L_{\alpha }(p)=L_{-\alpha }(p)\), for all \(-\infty <\alpha <\infty \), \(\alpha \ne 0\) and \(0\le p\le 1\).

Proof

Self-evident. \(\square \)

2.2 New functional forms of Lorenz curves

In order to derive new functional forms of Lorenz curves, we now consider the following initial Lorenz curves: egalitarian, Aggarwal and Singh Lorenz curve and Pareto Lorenz curve.

The \(\arctan \) egalitarian Lorenz curve is obtained in (4), by replacing the initial Lorenz curve with \(L_0(p)=p\). Thus, it is given by

$$\begin{aligned} L_{\alpha }(p)=\frac{1}{\arctan \alpha }\arctan \left( \frac{\alpha p}{1+\alpha ^2(1-p)}\right) ,\quad -\infty <\alpha <\infty ,\,\alpha \ne 0. \end{aligned}$$
(10)

The \(\arctan \) Aggarwal and Singh Lorenz curve is obtained in a similar way, replacing the initial Lorenz curve (see Aggarwal and Singh 1984; Arnold 1986) with \(L_0(p)=p/(1+\theta (1-p)),\quad \theta >0\), and therefore we have

$$\begin{aligned} L_{\alpha }(p)=\frac{1}{\arctan \alpha }\arctan \left( \frac{\alpha p}{1+(1-p)(\theta +\alpha ^2(1+\theta ))}\right) , \end{aligned}$$
(11)

where \(\theta >0,\, -\infty <\alpha <\infty ,\, \alpha \ne 0\).

Consider now the Pareto Lorenz curve

$$\begin{aligned} L_0(p)=1-(1-p)^{\theta },\;0<\theta < 1, \end{aligned}$$

from which we obtain the \(\arctan \) Pareto Lorenz curve

$$\begin{aligned} L_{\alpha }(p)=\frac{1}{\arctan \alpha }\arctan \left( \frac{\alpha (1-(1-p)^{\theta })}{1+\alpha ^2(1-p)^{\theta }}\right) . \end{aligned}$$
(12)

Finally, by taking as the initial one the Chotikapanich Lorenz curve given by \(L_0(p)=(\exp (\theta p)-1)/(\exp (\theta )-1),\;\theta >0\), we obtain the \(\arctan \) Chotikapanich Lorenz curve

$$\begin{aligned} L_{\alpha }(p)=\frac{1}{\arctan \alpha }\arctan \left( \frac{\alpha (\exp (\theta p)-1) }{\exp (\theta )-1+\alpha ^2(\exp (\theta )-\exp (\theta p))}\right) . \end{aligned}$$
(13)

Of course, other \(\arctan \) Lorenz curves can be obtained by replacing \(L_0(p)\) in (4) with other initial Lorenz curves, such as the Gupta or generalized Pareto Lorenz curves. We chose the above initial Lorenz curves because, as discussed in the next section, closed-form expressions can be obtained for some inequality measures and population functions.

3 Inequality measures and population functions

The corresponding Gini and Pietra indices can be computed straightforwardly when the egalitarian and Aggarwal initial Lorenz curves are chosen as \(L_0(p)\).

3.1 Gini and Yitzhaki indices

The Gini coefficient (also known as the Lorenz concentration ratio) is a measure (degree of concentration) of the inequality of a variable in a distribution of its elements, on a scale from 0 to 1. If \(|\alpha |<1\), \(\alpha \ne 0\), and using the following representation of the \(\arctan \) function

$$\begin{aligned} \arctan x=\sum _{n=0}^{\infty }\frac{(-1)^n}{2n+1}x^{2n+1},\quad |x|<1. \end{aligned}$$

then the Gini index, which is defined as

$$\begin{aligned} G=1-2\int _0^1 L_{\alpha }(p)\,\mathrm{d}p, \end{aligned}$$
(14)

can be written as

$$\begin{aligned} G=-1+\frac{2}{\arctan \alpha }\sum _{n=0}^{\infty }\frac{(-1)^n\alpha ^{2n+1}}{2n+1} \int _0^1(1-L_0(p))^{2n+1}\,\mathrm{d}p,\quad |\alpha |<1,\;\alpha \ne 0. \end{aligned}$$

When \(|\alpha |>1\), \(\alpha \ne 0\), more algebra is required, as we wish to obtain a closed form for the Gini index. In this case, and when the inverse of the initial Lorenz curve can be obtained simply, the Gini index is derived from the following result

Proposition 3

The Gini index for the Lorenz curve in (3) is given by

$$\begin{aligned} G=\frac{2}{\arctan \alpha }\int _0^{\arctan \alpha } L_0^{-1}\left( 1-\frac{1}{\alpha }\tan y\right) \,\mathrm{d}y-1, \end{aligned}$$
(15)

for \(-\infty <\alpha <\infty \), \(\alpha \ne 0\). Here, \(\tan \) is the circular tangent function and \(L_0^{-1}(\cdot )\) is the inverse of the initial Lorenz curve.

Proof

By computing the inverse function of the Lorenz curve in (3) and using a result given by Anderson (1970), we have

$$\begin{aligned} \int _0^1 L_{\alpha }(p)\,\mathrm{d}p=1-\int _0^1 L_{\alpha }^{-1}(y)\,\mathrm{d}y. \end{aligned}$$

Now, by computing the inverse of the Lorenz curve \(L_{\alpha }(p)\), we obtain the result after some simple algebra. \(\square \)

Expression (15) facilitates calculation of the Gini index, instead of using expression (14), especially when the inverse of the initial Lorenz curve can be computed straightforwardly.

For example, if we assume that the initial Lorenz curve is the egalitarian Lorenz curve then, by using (15), the Gini index is given by

$$\begin{aligned} G=1-\frac{\log (1+\alpha ^2)}{\alpha \arctan \alpha }. \end{aligned}$$

This result can also be obtained by performing integration by parts, taking into account that

$$\begin{aligned} \int _0^1\arctan (\alpha (1-p))\,\mathrm{d}p =\frac{1}{\alpha }\arctan \alpha -\frac{1}{2\alpha ^2}\log (1+\alpha ^2). \end{aligned}$$

An important generalization of the Gini index was proposed by Yitzhaki (1983), who suggested the generalized Gini index, which is defined as

$$\begin{aligned} G_{\nu }=1-\nu (\nu -1)\int _0^1 (1-p)^{\nu -2} L(p)\mathrm{d}p, \end{aligned}$$

where \(\nu >1\) and L(p) is the Lorenz curve. Of course, if \(\nu =2\), we obtain the Gini index. When \(L_0(p)=p\), after some algebra, we obtain that the Yitzhaki index is given (see “Appendix”) by

$$\begin{aligned} G_{\nu } = 1-\frac{\alpha }{\arctan \alpha }\left[ 1 +\frac{\nu \alpha ^2}{\nu +2}\,_2F_1\left( 1,1 +\nu /2;2+\nu /2,-\alpha ^2\right) \right] . \end{aligned}$$

In the case of the Aggarwal and Singh initial Lorenz curve, using (15), the Gini index is given by

$$\begin{aligned} G=\frac{2\theta }{\arctan \alpha }\int _0^{\arctan \alpha }\frac{\alpha -\tan y}{\alpha \theta -\tan y}\,\mathrm{d}y-1. \end{aligned}$$

Then, the Gini index (see “Appendix”) is expressed as

$$\begin{aligned} G=2\theta \left[ 1+\frac{\alpha (1-\theta )}{(1 +\alpha ^2\theta ^2)\arctan \alpha }\left( \log \left( \frac{\theta \sqrt{1 +\alpha ^2}}{\theta -1}\right) -\alpha \theta \arctan \alpha \right) \right] -1. \end{aligned}$$

Finally, assume the classical Pareto Lorenz curve as the initial Lorenz curve, and again using (15), the Gini index is given by

$$\begin{aligned} G=\frac{2}{\arctan \alpha }\int _0^{\arctan \alpha }\left[ 1 -\left( \frac{\tan y}{\alpha }\right) ^{1/k}\right] \,\mathrm{d}y-1. \end{aligned}$$

The above integral is developed in the “Appendix,” and the Gini index is found to be

$$\begin{aligned} G=1-\frac{2\alpha k}{(1+k)\arctan \alpha }\,_2F_1 \left( 1,\frac{1+k}{2k};\frac{3k+1}{2},-\alpha ^2\right) . \end{aligned}$$

Using numerical integration techniques, Gini and Yitzhaki indices can also be calculated when other Lorenz curves are assumed as \(L_0(p)\).

3.2 Pietra index

An interesting but less well-known index of inequality is the Pietra index, given by the proportion of total income that would need to be reallocated across the population to achieve perfect equality in income. This proportion is given by

$$\begin{aligned} P=\max _{0\le p\le 1}\left[ p-L_{\alpha }(p)\right] =\frac{1}{2\mu }E|X-\mu | \end{aligned}$$

and corresponds to the maximal vertical deviation between the Lorenz curve and the egalitarian line (Pietra 1915; Frosini 2012 calls this same index Pietra–Ricci index, owing to the extensive study made by Ricci (1916) on the same subject). Frosini (2005) also provides a simple graphical representation of this index.

Differentiating \(p-L_{\alpha }(p)\) and using (3), we find that the Pietra index is attained for a value of p satisfying the equation

$$\begin{aligned} \left[ 1+\alpha ^2\left( 1-L_0(p)\right) ^2\right] \arctan \alpha -\alpha L_0^{\prime }(p)=0. \end{aligned}$$

In particular, when \(L_0(p)=p\), the maximum is attained when

$$\begin{aligned} p=1-\frac{1}{\alpha }\sqrt{\frac{\alpha -\arctan \alpha }{\arctan \alpha }}. \end{aligned}$$

Then, the Pietra index is given, in this case, by

$$\begin{aligned} P=\frac{\arctan \left( \alpha \left( 1 -\frac{1}{\alpha }\sqrt{\frac{\alpha -\arctan \alpha }{\arctan \alpha }}\right) \right) }{\arctan \alpha }-\frac{1}{\alpha }\sqrt{\frac{\alpha -\arctan \alpha }{\arctan \alpha }}. \end{aligned}$$

When the initial Lorenz curve considered is the Aggarwal and Singh Lorenz curve, the maximum is attained when

$$\begin{aligned} p_0= & {} \frac{1}{\theta ^2+\alpha ^2(1+\theta )^2} \left[ (1+\theta )\left( \alpha ^2+\theta \left( 1+\alpha ^2\right) \right) \right. \\&- \left. \frac{1}{\sqrt{\arctan \alpha }}\sqrt{\alpha (1+\theta ) \left( \theta ^2+\alpha ^2(1+\theta )^2-\alpha (1+\theta )\arctan \alpha \right) }\right] , \end{aligned}$$

and the Pietra index is then

$$\begin{aligned} P=p_0+\frac{\arctan (\alpha (1-p_0(1-\theta )/(p_0-\theta )))}{\arctan \alpha }-1. \end{aligned}$$

Finally, for the Chotikapanich Lorenz curve, the Pietra index is

$$\begin{aligned} P=p_0-1+\frac{1}{\arctan \alpha }\arctan \left[ \alpha \left( 1-\frac{e^{\theta p_0}-1}{e^{\theta }-1}\right) \right] , \end{aligned}$$

where \(p_0\) is derived from

$$\begin{aligned} e^{\theta p_0}= & {} \frac{1}{2\alpha \arctan \alpha }\left[ (\theta +2\alpha \arctan \alpha )e^{\theta }-1\right. \\&\left. - \sqrt{\left( \theta ^2-(\arctan \alpha )^2\right) \left( e^{\theta }-1\right) ^2+4\alpha \theta e^{\theta }\left( e^{\theta }-1\right) \arctan \alpha }\right] . \end{aligned}$$

Numerical computation can be used to obtain the Pietra index in other cases, when the initial Lorenz curve assumed is other than the egalitarian and Aggarwal and Singh Lorenz curves.

3.3 Population functions

In some particular cases, closed-form expressions can be obtained for the distribution functions. For example, if we assume that \(L_0(p)=p\) we have, if \(\alpha <0\)

$$\begin{aligned} F(x)=1+\frac{1}{\alpha }\kappa _1(x;\mu ,\alpha ), \quad \kappa _2(\mu ,\alpha )\le x\le (1+\alpha ^2)\kappa _2(\mu ,\alpha ) \end{aligned}$$

and

$$\begin{aligned} F(x)=1-\frac{1}{\alpha }\kappa _1(x;\mu ,\alpha ), \quad \kappa _2(\mu ,\alpha )\le x\le (1+\alpha ^2)\kappa _2(\mu ,\alpha ), \end{aligned}$$

if \(\alpha >0\), where \(\kappa _1(x;\mu ,\alpha ) =\sqrt{\frac{\mu \alpha }{x\arctan \alpha }-1}\) and \(\kappa _2(\mu ,\alpha ) =\frac{\mu \alpha }{(1+\alpha ^2)\arctan \alpha }\). The corresponding probability density functions are

$$\begin{aligned} f(x)=\frac{(1+\alpha ^2)\kappa _2(\mu ,\alpha )}{2\alpha x^2\kappa _1(x;\mu ,\alpha )}, \quad \kappa _2(\mu ,\alpha )\le x\le (1+\alpha ^2)\kappa _2(\mu ,\alpha ) \end{aligned}$$

and

$$\begin{aligned} f(x)=-\frac{(1+\alpha ^2)\kappa _2(\mu ,\alpha )}{2\alpha x^2\kappa _1(x;\mu ,\alpha )}, \quad \kappa _2(\mu ,\alpha )\le x\le (1+\alpha ^2)\kappa _2(\mu ,\alpha ), \end{aligned}$$

for \(\alpha >0\) and \(\alpha <0\), respectively.

Let \(L_0(p)\) be the Aggarwal and Singh Lorenz curve. In this case, if \(\alpha <0\)

$$\begin{aligned} F(x)=\kappa _1(\alpha ,\theta ) +\frac{1}{\sqrt{x}}\kappa _2(\mu ,\alpha ,\theta ),\quad \left( \frac{\kappa _2(\mu ,\alpha ,\theta )}{\kappa _1(\alpha ,\theta )}\right) ^2\le x\le \left( \frac{\kappa _2(\mu ,\alpha ,\theta )}{\kappa _1(\alpha ,\theta ) -1}\right) ^2 \end{aligned}$$

and

$$\begin{aligned} F(x)=\kappa _1(\alpha ,\theta ) -\frac{1}{\sqrt{x}}\kappa _2(\mu ,\alpha ,\theta ),\quad \left( \frac{\kappa _2(\mu ,\alpha ,\theta )}{\kappa _1(\alpha ,\theta )}\right) ^2\le x\le \left( \frac{\kappa _2(\mu ,\alpha ,\theta )}{\kappa _1(\alpha ,\theta ) -1}\right) ^2 \end{aligned}$$

if \(\alpha >0\), where

$$\begin{aligned} \kappa _1(\alpha ,\theta )= & {} \frac{ (1+\theta ) \left( \theta +\alpha ^2 (1+\theta )\right) }{ \left( \theta ^2+\alpha ^2 (1+\theta )^2\right) },\\ \kappa _2(\mu ,\alpha ,\theta )= & {} \frac{\sqrt{ \alpha \arctan \alpha \left( \theta \mu \left( \theta ^2+\alpha ^2 (1+\theta )^2\right) -\alpha (1+\theta )^2 \arctan \alpha \right) }}{ \left( \theta ^2+\alpha ^2 (1+\theta )^2\right) \arctan \alpha }. \end{aligned}$$

The corresponding probability density functions are

$$\begin{aligned} f(x)=-\frac{1}{2x\sqrt{x}}\kappa _2(\mu ,\alpha ,\theta ),\quad \left( \frac{\kappa _2(\mu ,\alpha ,\theta )}{\kappa _1(\alpha ,\theta )} \right) ^2\le x\le \left( \frac{\kappa _2(\mu ,\alpha ,\theta )}{\kappa _1(\alpha ,\theta )-1} \right) ^2 \end{aligned}$$

and

$$\begin{aligned} f(x)=\frac{1}{2x\sqrt{x}}\kappa _2(\mu ,\alpha ,\theta ),\quad \left( \frac{\kappa _2(\mu ,\alpha ,\theta )}{\kappa _1(\alpha ,\theta )}\right) ^2\le x\le \left( \frac{\kappa _2(\mu ,\alpha ,\theta )}{\kappa _1(\alpha ,\theta )-1}\right) ^2 \end{aligned}$$

for \(\alpha >0\) and \(\alpha <0\), respectively.

Finally, for the arctan Chotikapanich Lorenz curve, the population function becomes

$$\begin{aligned} F(x)=\frac{1}{\theta }\log \left[ \frac{\theta \mu (e^{\theta }-1) +2\alpha x e^{\theta }\arctan \alpha -\sqrt{e^{\theta }-1}H(\alpha ,\theta ,\mu ,x)}{2x\alpha \arctan \alpha }\right] , \end{aligned}$$

where

$$\begin{aligned} H(\alpha ,\theta ,\mu ,x)=\theta ^2\mu ^2(e^{\theta }-1)+4x\arctan \alpha \left[ x\arctan \alpha +e^{\theta }(\alpha \theta \mu -x\arctan \alpha )\right] , \end{aligned}$$

begin \(-\infty <\alpha <\infty \), \(\alpha \ne 0\) and

$$\begin{aligned} \frac{\alpha \theta \mu }{\left( \alpha ^2+1\right) \left( e^{\theta }-1\right) \arctan \alpha }\le x\le \frac{\alpha \theta \mu }{\left( \alpha ^2+1\right) \left( e^{\theta }-1\right) \arctan \alpha }. \end{aligned}$$

4 Numerical application

To compare the performance of the functional forms given in (10), (11) and (12), we used the US data (for 2009 and 2013) obtained from the US Census Bureau, Current Population Survey, 2014 Annual Social and Economic Supplement (see “Appendix, Tables 5 and 6”). Three methods of estimation are considered, as described below.

4.1 Nonlinear least squares estimators

These are defined by the estimators which minimize the sum of the squared differences between the predicted and observed values. For a particular Lorenz curve \(L_{\alpha }(p)\), the minimization is associated with the expression

$$\begin{aligned} \sum _{i=1}^{n}(p_i-L_{\alpha }(p_i))^2, \end{aligned}$$

where the points \((p_i,L_{\alpha }(p_i))_{i=1}^n\) are available from an empirical Lorenz curve.

From the approximation given in (7), we consider as initial estimates those obtained by least squares, replacing \(L_0(p)\) for the classical expression and in every case mapping from the observations to the estimated parameters. This expression can also be employed to obtain estimates by the method proposed by Castillo et al. (1998). In this case, we begin by considering a single point \((p_i, q_i)\) of the empirical Lorenz curve, and by substituting in (7), we obtain the simple estimate for \(\alpha \) given by

$$\begin{aligned} \widehat{\alpha }_i\approx \frac{L_0(p_i; \widehat{\phi })-q_i}{q_i(1-L_0(p_i;\widehat{\phi }))},\quad i=1,2,\dots ,n, \end{aligned}$$
(16)

where \(\widehat{\phi }\) is the least squares estimate obtained from the classical Lorenz curve, which depends on parameter \(\phi \) (which is a vector of parameters when the classical Lorenz curve depends on more than one parameter). By combining all the initial estimators (16) using a function such as the mean or median, the final estimators are obtained. For example, if we use the mean function, the final estimation of \(\alpha \) will be \(\widehat{\alpha }=\frac{1}{n}\sum _{i=1}^{n}\widehat{\alpha }_i\).

Table 1 Results for the parameter estimates and MSE and MAX criteria

Finally, the results for the two data sets, 2009 and 2013, are shown in Table 1. The parameter estimates, the mean squared error (MSE) and the maximum absolute error (MAX) were computed for the two data sets considered. The corresponding table shows that the new models provide better results in terms of smaller MSE, MAX, Gini and Pietra indices (the empirical Gini, computed according to Brown’s formula, and Pietra indices give the results 0.450007 and 0.324733, respectively, for the 2003 data and 0.457607 and 0.330401 for the 2013 data) with respect to the initial Lorenz curves considered, and that the best fit is obtained with the new functional forms proposed.

Fig. 1
figure 1

Lorenz curves for 2003 US income data based on nonlinear least square estimates. Dashed curves represent the classical model and continuous curves, the arctan model

Figure 1 presents a graphical comparison between the empirical Lorenz curves and the corresponding estimated Lorenz curves based on the nonlinear least squares estimators for the Egalitarian and Pareto cases.

4.2 Maximum likelihood estimation based on the use of the population function

Maximum likelihood estimation based on the use of the population function was also studied, using the cumulative distribution functions given in Sect. 3.3. When data are grouped, let \(n_i\) be the number of observations in the interval \((c_{j-1},c_j]\). The log-likelihood function is then,

$$\begin{aligned} \ell (\phi )=\sum _{i=1}^{n}n_i\log \left[ F(x_i|\phi )-F(x_{i-1} |\phi \right] , \end{aligned}$$

where n is the sample size and \(\phi \) the parameter/s to be estimated. See Chotikapanich (2008) for details. From Table 2, we can see that the arctan model provides the value of the maximum of the log-likelihood function in a better way than does the Dirichlet distribution.

Table 2 MLE based on cumulative distribution function

Because there is a mapping from the Lorenz curve to the density of the data and in order to correct standard errors for model misspecification, we have estimated the parameters of interest by maximizing the log-likelihood and obtained robust (sandwich) standard errors. See Freedman (2006) for details.

Finally, when the population function associated with a given Lorenz curve is not known, estimation based on the use of the Dirichlet distribution is adequate for comparing different models (see Chotikapanich and Griffiths 2002).

4.3 Model validation

For the situation in which the models are non-nested, a Vuong test was conducted to compare the estimates of the different Lorenz curves. In this regard, we test the null hypothesis that the two models are equally close to the actual model, against the alternative that one of them is closer (Vuong 1989). The z-statistic is

$$\begin{aligned} Z=\frac{1}{\omega \sqrt{n}}\left( \ell \left( \widehat{\theta }_1\right) -\ell \left( \widehat{\theta }_2\right) \right) , \end{aligned}$$

where \(\widehat{\theta }_1\) and \(\widehat{\theta }_2\) are vectors of the estimated parameters and

$$\begin{aligned} \omega ^2=\frac{1}{n}\sum _{i=1}^{n} \left[ \log \left( \frac{f\left( x_i|\widehat{\theta }_1\right) }{g\left( x_i|\widehat{\theta }_2\right) }\right) \right] ^2- \left[ \frac{1}{n} \sum _{i=1}^{n}\log \left( \frac{f\left( x_i|\widehat{\theta }_1\right) }{g\left( x_i|\widehat{\theta }_2\right) }\right) \right] ^2 \end{aligned}$$

where f and g represent the probability density functions of the two models to be compared, respectively.

Due to the asymptotically normal behavior of the Z statistic, the null hypothesis is rejected in favor of the alternative that f occurs with a significance level \(\alpha ,\) when \(Z>z_{1-\alpha }\), where \(z_{1-\alpha }\) is the \((1-\alpha )\) quantile of the standard normal distribution.

To work with this test, we choose a critical value from the standard normal distribution that corresponds to the desired level of significance (e.g., for \(c = 1.96\); \(\Pr (z\ge |\pm c|)=0.05\)). Then, if \(z>c\), we reject the null hypothesis that the models are the same, in favor of the alternative that f is better than g. Thus, if \(z<c\), we reject the null hypothesis that the models are the same in favor of the alternative that g is better than f, while if \(z\le c\), we cannot reject the null hypothesis that the models are the same. Under this criterion, and from Table 3, we conclude that the classical Aggarwal and Singh Lorenz curve performs all the arctan models proposed and that the Chotikapanich Lorenz curve performs the arctan Egalitarian Lorenz curve. In the remaining cases, the arctan models are better than the Pareto and the Chotikapanich Lorenz curves.

Table 3 Vuong test comparison of non-nested models

Finally, we examined whether likelihood ratio tests suggested that nested versions were adequate. This test was computed, and the results obtained are shown in Table 4. As we can see, the arctan model performs the classical model.

Table 4 Log-likelihood ratio comparison of nested models

5 Conclusions

The proposed family of Lorenz curves seems to be a worthy addition to the existing class of single parameter Lorenz curves. The family was applied to two data sets with satisfactory results, using least squares and maximum likelihood. Thus, the new specification is well capable of modeling income data.