Abstract
Almost all governmental and international agencies use the Gini index to summarize income inequality in a nation or the world. The index has been criticized because it can have the same value for two different distributions. It will be seen that other commonly used summary measures of inequality are subject to the same criticism. The Gini index has the advantage that it is able to distinguish between two distributions that have identical integer valued generalized entropy measures. Because no single measure can fully summarize a distribution, researchers should consider combining the Gini index with another measure appropriate for the topic being studied.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Although the Gini index is the most commonly used measure of income inequality and has many desirable properties (see [12, 16, 21]), it has often been criticized (see [3, 4, 13, 20]), because its values calculated from two different distributions can be equal. While this is true, the other major summary indices of inequality, e.g., the generalized entropy family including the Theil’s index, the mean log deviation, one half of the squared coefficient of variation, as well as the Palma index (the ratio of the shares of income received by the top 10% to the bottom 40%) and other ratios of the top \(100q\%\) to the bottom \(100p\%\) shares e.g., Dorling [6] \(p=q=0.2\), Jasso [15] \(p=1-q\), also can have the same value for different distributions. From a statistical viewpoint, this is not surprising, as two parameters, the mean and the standard deviation are needed to determine a normal distribution. Thus, it is unreasonable to expect any single summary measure to uniquely determine the underlying distribution and researchers should consider combining the Gini index with another measure focusing on the portion of the income distribution most relevant to the topic under investigation.
This paper presents several examples clarifying the relationship between income inequality measures and the underlying income distribution. In Sect. 2, examples of two distributions with the same value of one of the three commonly used inequality measures (e.g., the Gini index), but different values of the other two (e.g., the mean log deviation and the Theil index) are presented. A classical example in the theory of moments is used in Sect. 3 to demonstrate that the values of an infinite family of the generalized entropy indices can be identical for two different distributions. In particular, this implies that two distributions can have identical values of one half of the squared coefficient of variation, the mean log deviation and the Theil index but their Gini indices are not equal. In Sect. 4, several recently proposed measures based on the ratio of the share of income received by the top \(100q\%\) to the share of income received by the bottom \(100p\%\) are examined. Given the Lorenz curve of an income distribution, the share of the top \(100q_0\%\) and the share of the bottom \(100p_0\%\), using the approach described in [9, 11], one can construct two Lorenz curves having the same values of \(L(p_0)\) and \(L(1-q_0)\) as the given Lorenz curve, which bound the original Lorenz curve from above and below. Consequently, the given Lorenz curve and the two bounding Lorenz curves have the same ratio of the incomes received by the top \(100q_0\%\) and bottom \(100p_0\%\). The Gini indices of the three distributions, however, are different and the Gini index of the given income distribution lies in between the Gini indices of the bounding Lorenz curves. Furthermore, an example of a class of discrete income distributions with the same ratio of the top to bottom shares but very different Gini indices is presented in Sect. 5. A summary and discussion of the implications of these results are of given in Sect. 6.
2 Examples of distributions with the same value of one of three commonly used inequality but not the others
Consider three commonly used inequality indices: the mean log deviation (MLD), the Theil index (TI) and the Gini index (G), which are defined in [5] as
where Y is the random variable (r.v.) generating the income data, \(\mu _Y=E[Y]\) and \(Y_1\) and \(Y_2\) are i.i.d. as Y. The construction of the examples uses the fact that for any r.v. Y and a specific inequality index (e.g., MLD, the Theil index, or the Gini index) with value \(\gamma \), there exists a discrete r.v. \(X_k\) of the form
with the same value (\(\gamma \)) of that inequality measure. For any given value of k, one solves for p after setting the value of the specified inequality index of \(X_k\) to \( \gamma \). The following example demonstrates that while \(X_k\) and Y have the same value of the specified index, their values of the other inequality indices can be noticeably different.
To illustrate the construction of the r.v’s \(X_k\), let Y follow a Pareto distribution with \((\alpha ,\theta )\). From the formulas given in [1, 5], one obtains
From (2), the values of the three indices for a discrete r.v. of the form \(X_k\) are
When \(Y\sim Pareto(\alpha =3,\theta =1)\), by (3), the mean log deviation for Y is 0.072. Consider a r.v. of the form \(X_3\), i.e. \(k=3\). From the condition \(MLD_Y=MLD_{X_3}\) and (4), one obtains that \(p=0.187\). For this value of p, the values of the other two indices (the Gini index and the Theil index) for \(X_3\) are determined by (5) and (6), which are 0.116 and 0.055, respectively. Similarly, for the Gini index or the Theil index, one can obtain the discrete r.v. \(X_3\) by solving p from the condition that the value of the specific index for \(X_3\) equals the corresponding value for Y. Once the value of p is calculated, the values of the other two indices for \(X_3\) are determined.
When \(k=3\), Table 1 reports the values of p determined from Eqs. (4), (5) and (6) so that one of the values of the mean log deviation, the Gini index, and the Theil index for \(X_3\) matches the corresponding value for \(Y \sim Pareto (3,1)\). The values of the other two indices for that distribution are also reported. Table 2 presents the similar results when \(k=10\).
Tables 1 and 2 show that one can have different members of the \(X_k\) family with the same value of any specific inequality index and noticeably different values of the others. For example, consider the second row of both tables referring to the r.v.’s \(X_3\) and \(X_{10}\) with the same value of the mean log deviation. Notice that the Gini index of \(X_{10}\) is less than one-half of the Gini index of \(X_3\). The Theil indices for \(X_3\) and \(X_{10}\) are also different. The results in Tables 1 and 2 also illustrate that for any \(X_k\), there can be two solutions for the parameter p, so that two different \(X_k\)s has the same value of the specific index, e.g., consider the third and fourth row in Table 1, \(X_3\) with \(p= 0.334\) and \(p=0.896\) and the same value of the Theil index. Therefore, their corresponding \(X_k\) would have the same value of the specific index but different values of the other inequality indices.
3 An example of two distributions with the same integer-valued generalized entropy measure
The generalized entropy family of indices are defined as
where a gives more weight to the upper portion of the distribution as it increases. \(GE_Y(0)\) and \(GE_Y(1)\) are defined as \(GE_Y\left( 0\right) =\lim _{a\rightarrow 0}GE_Y\left( a\right) \) and \(GE_Y\left( 1\right) =\lim _{\alpha \rightarrow 1} GE_Y\left( a\right) \) respectively. Recall that \(GE_Y(0)\) is the mean log deviation, \(GE_Y(1)\) is the Theil index, and \(GE_Y(2)\) is one half of the squared coefficient of variation. It follows from (7) that if two underlying income distributions have the same ath moments for \(a=1,2,\ldots \), the corresponding sequence of the generalized entropy measures for these two underlying income distributions will be the same.
It is well known that the lognormal distribution does not satisfy the conditions [2, 19] required for a distribution to be uniquely determined by its moments. In particular, Heyde [14] presented a family of distributions indexed by a parameter \(\epsilon \), having the same moments as the lognormal. Hence, their values of GE(a) are the same for each integer value of a. The Heyde’s family of r.v.’s is given as the following.
Lemma 1
(Heyde’s family of r.v.’s) Let Y be the lognormal random variable such that \(\log Y\sim N\left( \mu ,\sigma \right) \) and let \(Y_{\epsilon }\) be a random variable with the pdf
then we have
where \(I_{a}=\epsilon \int _{-\infty }^{\infty }y^{a}f_{Y}(y)\sin \left( 2\pi (\log y-\mu )\right) dy=0\) for all integer values of a. Hence, for all integer values of a,
The density function of the Heyde’s family of r.v.’s differs from the log normal random variable by a function that oscillates with a magnitude of \(\epsilon \), but integrates to zero. Lemma 1 implies that for any lognormal distribution, there is an infinite family of distributions with the same general entropy index for all integer values of a. On the other hand, the values of the corresponding Gini indices differ. The Gini indices for several members of Heyde’s family of distributions having the same moments as the lognormal distribution with \(\mu =0,\sigma =1\) are given in Table 3.
Table 3 shows that when \(\epsilon \) takes different values, the Gini indices for \(Y_{\epsilon }\) are different. Because the distributions of \(Y_{\epsilon }\) and Y have the same moment sequence, their distributions should be very close to one another, which is why the values of the Gini index for different \(\epsilon \) are similar. In [19], one can find other examples of families of distributions with the same moment sequence as well as the same values of GE(a) for \(a=1,2,\ldots \).
4 Example of distributions with the same ratio of the top to bottom shares
The inequality measure r(p, q) having been used by Palma [18] (\(p=0.4,\ q=0.1\)) and Dorling [6] (\(p=0.2,q=0.2\)) is the ratio of the share of the income received by the top \(100q\%\) to the share of the income received by the bottom \(100p\%\). This family of inequality measures focuses on the difference of the upper \(100q\%\) and the lower \(100p\%\) of the distribution, but ignores the middle portion of the distribution. It is obvious that there are many Lorenz curves having the same values of L(p) and \(L(1-q)\) but differ noticeably over the range of \((p,1-q)\).
The ratio of the top to bottom shares for an income Y is defined as
where \(L_Y(p)=\mu _Y^{-1}\int _{-\infty } ^{p}F^{-1}(t)dt=\mu _Y^{-1} \int _{-\infty }^{y_p} t\cdot f_Y(t)dt\) is the Lorenz curve [8] of Y, and \(y_p=F^{-1}(p)=\inf \lbrace y:F_Y\left( y\right) \ge p\rbrace \) is the pth quantile of Y.
For any pair of values \((p_0,q_0)\) and an income distribution (Y) with its Lorenz curve \(L_Y(p)\), following the approach in [9, 11], one can obtain two piecewise linear Lorenz curves that bound \(L_Y(p)\) from above (by \(L_A(p)\)) and below (by \(L_B(p)\)) with the same values of \(L_Y(p_0)\) and \(L_Y(1-q_0)\), i.e., \(L_A(p_0)=L_B(p_0)=L_Y(p_0)\) and \(L_A(1-q_0)=L_B(1-q_0)=L_Y(1-q_0)\). By construction, the ratio \(r(p_0,q_0)\) of \(L_A(p)\), \(L_B(p)\) and \(L_Y(p)\) are equal.
The formulas for \(L_A(p)\) and \(L_B(p)\) will now be described. Let \(L_A(p)\) be the piecewise linear Lorenz curve bounding \(L_Y(p)\) from above. \(L_A(p)\) consists of line segments \(T_{A_1}(p)\), \(T_{A_2}(p)\) and \(T_{A_3}(p)\), where \(T_{A_1}(p)\) is the line segment connecting (0, 0) and \((p_0, L_Y(p_0))\); \(T_{A_2}(p)\) is the line segment connecting \((p_0, L_Y(p_0))\) and \((1-q_0,L_Y(1-q_0))\); and \(T_{A_3}(p)\) is the line segment connecting \((1-q_0, L_Y(1-q_0))\) and (1, 1). Therefore, it follows that
\(L_B(p)\) consists four linear segments \(T_{B_1}(p)\), \(T_{B_2}(p)\), \(T_{B_3}(p)\), and \(T_{B_4}(p)\) that circumscribes \(L_Y(p)\) at \((p_0,L_Y(p_0))\) and \((1-q_0,L_Y(1-q_0))\). Here \(T_{B_2}(p)\) and \(T_{B_3}(p)\) are the two tangent lines segments circumscribing \(L_Y(p)\) at \((p_0,L_Y(p_0))\) and \((1-q_0,L_Y(1-q_0))\) respectively. They intersect at \((x_0,y_0)\). In addition, \(T_{B_2}(p)\) intersects \(L(p)=0\) at \((x_1,0)\) and \(T_{B_1}(p)\) is the line segment connecting (0, 0) and \((x_1,0)\). \(T_{B_3}(p)\) intersects \(p=1\) at \((1,T_{B_2}(1))\), and \(T_{B_4}(p)\) connects \((1,T_{B_2}(1))\) to (1, 1). Therefore, we have
where \(L_Y'(p)= F_Y^{-1}(p)/\mu \) is the derivative of \(L_Y(p)\).
By the construction, the area between the line of equality and \(L_A(p)\) will be smaller than the area between the line of equality and \(L_Y(p)\), and the area between the line of equality and \(L_B(p)\) will be the largest. This implies that \(G_B>G_Y>G_A\). In addition, \(L_A(p_0)=L_Y(p_0)=L_B(p_0)\) and \(L_A(1-q_0)=L_Y(1-q_0)=L_B(1-q_0)\), yielding \(r_{A}(p_0,q_0)=r_{B}(p_0,q_0)=r_{Y}(p_0,q_0)\). Hence, the three Lorenz curves have the same value of \(r(p_0,q_0) \) but different values of the Gini index.
To illustrate an example of how the curves \(L_A\) and \(L_B\) bouding a Lorenz curve of an income distribution, consider \(Y\sim Pareto(\alpha ,\theta )\). From [1], we have
When \(p_0= 0.4\), \(q_0=0.1\) and \(\alpha = 1.125\), from (12),(13), (15), and (16), \(L_{A}(p)\) and \(L_{B}(p)\) are
Fig. 1 presents \(L_Y(p)\) (solid curve), \(L_A(p)\) (dotted curve) and \(L_B(p)\) (dash curve) for \(Y\sim Pareto(1,1.125)\), \(p_0=0.4\) and \(q_0=0.1\). From the figure, it can be seen that the \(L_Y(p)\) is between \(L_A(p)\) and \(L_B(p)\). The three Lorenz curves have the same r(0.4, 0.1) as \(L_A(p)\) and \(L_B(p)\), because their values are the same when \(p=0.4\) and 0.9. Calculating the areas between each of the Lorenz curve and the line of equality, it follows that
In this example, the difference between the Gini indices of the two bounding Lorenz curves is about 0.1. The reason \(G_A\) is less than \(G_Y\) is because its distribution assumes that all individuals in the region between the 40th and 90th percentiles have the same income.
5 A family of discrete distributions with the same value of the ratio of top to bottom shares
This section describes a family of discrete distributions having the same pre-specified ratio of the top to bottom shares \(r(p_0,q_0)\), with noticeably different values of the Gini indices. This means that the middle portion of these distributions varies substantially while the shares of income received by the top \(100q_0\%\) and the bottom \(100p_0\%\) remain unchanged.
Suppose a population only has three classes of households, lower, middle and upper, and every person within each class receives the same income. For a specified value of the ratio of the top to bottom shares \(r(p_0,q_0)\), by varing the share received by the households in the middle class, one can generate many populations with the same value of the ratio of the top to bottom shares, but different values of the Gini index. For example, let \(p_0\%\) (\(q_0\%\)) be the proportion of the households belonging to the lower (upper) class, so that \(1-p_0-q_0\%\) is proportion of the households in the middle class. Consider the family of income distributions \(Y_x\) with the same value \(r=r(p_0,q_0)\) defined by
where
for any \(1\le x\le rp_0/q_0\) and \( rp_0/q_0>1\). While r is determined by \(L(p_0)\) and \(L(1-q_0)\), and does not vary for any distributions of the form \(Y_x\), the share of the income received by the middle class depends on x will increase as x goes from 1 to \(rp_0/q_0\), i.e., as the income recieved by the middle class approaches that of the upper class.
For any value of x, the Gini index of \(Y_x\) is
When \(r>(1-q_0)/(1-p_0)\), \(G_{Y_x}\) is monotone decreasing in x, and when \(r<(1-q_0)/(1-p_0)\), \(G_{Y_x}\) is monotone increasing in x. In either case, \(G_{Y_x}\) is in between
Therefore, as the income x, that all members of the middle class receive, varies, the Gini index of \(Y_x\) also varies.
To illustrate the wide range of the values that the Gini index can have for different distributions as the \(Y_x\) family, consider the case that \(r=10\), \(p_0=0.4\), and \(q_0=0.1\). Table 4 presents the values of the Gini index for several values of x.
From Table 4, one sees that the Gini indices of \(Y_1\) and \(Y_{40}\) are 0.716 and 0.384 respectively. These values reflect that as x increases from \(1^+\) to \(40^-\), the income of a middle class household goes from being lower class to being near upper class and the Gini index decreases.
6 Conclusion
This article demonstrates that the major indices of income inequality can take the same value for different distributions so that criticizing the Gini index or any other single measure of income inequality due to this reason is inappropriate. An example of two different distributions, arising in the theory of moments [2], having identical moments of all orders, indicates that a single index will not be able to distinguish between all pairs of distributions. The Gini index, however, is able to distinguish between members of a family of distributions with the same values for the generalized entropy measures with an integer index.
The results in Sect. 3 demonstrate that it is unreasonable to expect a single measure to completely describe the entire income distribution. Hence combining several measures may provide more information about the income distribution. For example, Oancea and Pirjol [17] show that for a heavily skewed distribution, the Theil index can increase to infinity, but the Gini index is bounded by one. However, the modified Gini index [10] which replaces the mean by the median, i.e., multiplies the Gini index by the ratio of the mean to the median, a measure of skewness, resolves this problem. Foster and Wolfson [7] combine the Gini index with the relative median deviation, i.e., \((\mu _U-\mu _L)/\mu \) where \(\mu _U\) is the mean of those above the median and \(\mu _L\) is the mean of those below the median, to measure the polarization of an income distribution.
Because no single measure can summarize the entire income distribution, researchers may benefit from combining the Gini index with another measure which emphasizes the portion of the underlying distribution most relevant to the research problem.
References
Arnold, B.: Pareto Distributions, 2nd edn. CRC, Boca Raton (2015)
Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (2008)
Ceriani, L., Verme, P.: The origins of the Gini index: extracts from variabilità e mutabilità (1912) by Corrado Gini. J. Econ. Inequal. 10(3), 421–443 (2012)
Charles-Coll, J.A.: Understanding income inequality: concept, causes and measurement. Int. J. Econ. Manag. Sci. 1(3), 17–28 (2011)
Cowell, F.A.: Measuring Inequality, 3rd edn. Oxford University Press, Oxford (2011)
Dorling, D.: Inequality and the 1%, 1st edn. Verso, Brooklyn (2014)
Foster, J.E., Wolfson, M.C.: Polarization and the decline of the middle class: Canada and the US. J Econ Inequal 8(2), 247–273 (2010)
Gastwirth, J.L.: A general definition of the Lorenz curve. Econometrics 39(6), 1037–1039 (1971)
Gastwirth, J.L.: The estimation of the Lorenz curve and Gini index. Rev. Econ. Stat. 54(3), 306–316 (1972)
Gastwirth, J.L.: A robust Gini type-index better detects shifts in the income distribution: a reanalysis of income inequality in the United States from 1967–2011. Available at SSRN 2164745 (2012)
Gastwirth, J.L.: Is the Gini index of inequality overly sensitive to changes in the middle of the income distribution? Stat. Public Policy 4(1), 1–11 (2017)
Giorgi, G.M., Gigliarano, C.: The Gini concentration index: a review of the inference literature. J. Econ. Surv. 31(4), 1130–1148 (2017)
Hesse, J.O.: Fact or fiction? complexities of economic inequality in twenties century Germany. In: Hudson, P., Tribe K. (eds.) The Contradictions of Capital in the Twenty-first Century, chap. II., pp. 87–107. Agenda, Newcastle upon Tyne, UK (2016)
Heyde, C.C.: On a property of the lognormal distribution. J. R. Stat. Soc. Ser. B (Methodological) 25(2), 392–393 (1963)
Jasso, G.: Anything Lorenz curves can do, top shares can do: assessing the topbot family of inequality measures. Sociol. Methods Res. (2018). https://doi.org/10.1177/0049124118769106
Lambert, P.J., Decoster, A.: The Gini coefficient reveals more. Metron pp. 373–400
Oancea, B., Pirjol, D.: Extremal properties of the Theil and Gini measures of inequality. Qual. Quant. 53(2), 859–869 (2019)
Palma, G.: Homogeneous middles vs. heterogeneous tails, and the end of the ’inverted-u’: It’s all about the share of the rich. Dev. Change 42(1), 87153 (2011)
Stoyanov, J.M.: Counterexamples in Probability. Dover Publications, Newburyport (2013)
Wikipedia: Gini coefficient. https://en.wikipedia.org/wiki/Gini_coefficient
Yitzhaki, S., Schechtman, E.: The Gini Methodology: A Primer on a Statistical Methodology. Springer Series in Statistics, vol. 272. Springer, New York (2013)
Acknowledgements
It is a pleasure to thank two anonymous reviewers and the editor for many helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Y., Gastwirth, J.L. On the capacity of the Gini index to represent income distributions. METRON 78, 61–69 (2020). https://doi.org/10.1007/s40300-020-00164-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40300-020-00164-8