1 Introduction

Overlapping subgroups play a particular role in inequality decomposition since they give a complex and not easily assessable contribution to total inequality. With respect to the measurement of the traditional components of inequality decomposition, the inequality within subgroups and the inequality between subgroups, it is possible to observe relevant effects related to the presence of overlapping units.

First, overlapping units lead to an increase of the importance of inequality within subgroups on total inequality.

Second, they introduce some non trivial problems into the measurement of the inequality between subgroups.

Third, they pose relevant questions on the interpretation of inequality between subgroups.

Given a poor subgroup A and a non-poor subgroup B, we need to interpret the contribution to the inequality between subgroups given by a unit belonging to A, which is richer than a unit belonging to B. Or, analogously, we need to interpret the contribution to the inequality between subgroups given by a unit belonging to B, which is poorer than a unit belonging to A.

A related point to be addressed is whether overlapping units belonging to A are to be classified as “poor” and whether overlapping units belonging to B are to be considered as “non-poor”.

It follows that overlapping component requires a more detailed framework for both the measurement and the interpretation of inequality. In particular, in order to properly evaluate and explain total inequality, overlapping units have to be analyzed separately from non overlapping units.

Furthermore, the aim of inequality decompositions is generally related to the identification of the relevant factors determining the inequality structure. Given a possible inequality factor (such as gender, working condition, education level, area of residence, etc.) and having decomposed the inequality in terms of such factor, high levels of overlapping indicate that the factor only slightly contributes to total inequality, while low levels of overlapping suggest a stronger contribution. Therefore, overlapping analysis provides a powerful insight on inequality structure and also allows to assess the importance of the various factors on total inequality.

The case of overlapping subgroups has been deeply analyzed, using the term “transvariation”, by Gini [13, 14] and Dagum [4]. Their studies link transvariation to inequality measurement achieved by means of the Gini index [6, 10, 12, 17]. The effects of overlapping component on inequality decomposition have been the focus of the interest for many other researchers (see [12, 13, 17]). The different ways to analyze and interpret these effects represent the main motivation of the various approaches to the Gini index decomposition.

The aim of the paper is to evaluate and compare alternative proposals for the measurement of overlapping component in the Gini index decomposition. Furthermore, we want to analyse the properties of these proposals with respect to different degrees of overlapping.

To this purpose we develop a Monte Carlo study which allows to observe and compare the various Gini index decompositions as well as to illustrate the effects related to increasing levels of overlapping.

The paper is organized as follows. Section 2 addresses the measurement of overlapping. Section 3 illustrates three different methodologies for the decomposition of the Gini index. Section 4 develops the simulation study and Sect. 5 concludes.

2 The measurement of overlapping

The general case of a population of n units disaggregated into k subgroups of size \(n_{j}\), with \(\sum \nolimits _{j=1}^k {n_j =n} \), refers to subgroups characterized by some degree of overlapping. Let be \(y_{ji}\) the value of character y in the i-th unit of the j-th subgroup and, accordingly, \(y_{hr}\) the value of y in the r-th unit of the h-th subgroup.

Overlapping occurs when at least one of the differences \((y_{ji} -y_{hr} )\), for \(i=1\),..., \(n_{j}\) and \(r=1\),..., \(n_{h}\), shows opposite sign to the difference \((\lambda _j -\lambda _h )\), where \({\lambda _j}\) and \({\lambda _h}\) are mean values (usually the arithmetic mean or the median) respectively for subgroups j and h.

In order to evaluate the relevance of overlapping, Gini introduced two indexes: the probability of transvariation and the intensity of transvariation [14].

Probability of transvariation refers to overlapping which occurs when \(y_{ji}~<~y_{hr}\) and the median of the j-th subgroup, \({ }_{me}\bar{y}_j \), is greater or equal to the median of the h-th subgroup,\({ }_{me}\bar{y}_h \).

Null differences (\(y_{ji}-y_{hr})\) are equally divided between overlapping and non-overlapping differences. The probability of transvariation between subgroups j and h, \(p(t_{jh})\), is then computed as the ratio between the number of overlapping differences and its maximum; that is:

$$\begin{aligned} p(t_{jh}) = (2_{1}{nt}_{jh}+_{2}{nt}_{jh})/n_{j}n_{h} \end{aligned}$$
(1)

where \(_{1}{nt}_{jh}\) and \(_{2}nt_{jh}\) are, respectively, the number of negative and the number of null differences (\(y_{ji}-y_{hr})\) for \(i=1\),..., \(n_{j}\) and \(r=1\),..., \(n_{h}\), with \({ }_{me}\bar{y}_j \ge _{me}\bar{y}_h \).

The maximum of \(p(t_{jh})\) is reached when the two subgroups completely overlap and the median of the j-th subgroup equals the median of the h-th subgroup. The probability of transvariation ranges between 0, when there is no transvariation and no difference is overlapping, and 1, when overlapping component reaches its maximum.

The second index introduced by Gini, the so called intensity of transvariation, refers to overlapping which occurs when \(y_{ji}~<\) \(y_{hr}\) while the mean of the j-th subgroup, \(\bar{y}_j\), is greater or equal to the mean of the h-th subgroup, \(\bar{y}_h \).

With respect to the probability of transvariation, which takes the number of overlapping differences into account, the intensity of transvariation is based on their size, that is on the quantities \(\vert y_{ji} -y_{hr} \vert \) for \(y_{ji}~<~y_{hr}\), \(\bar{y}_j \ge \bar{y}_h \), \(i=1\),..., \(n_{j}\) and \(r=1\), \(n_{h}\).

Let \(T_{jh}\) be the sum of overlapping differences between subgroups j and h: \(T_{jh} =\sum \nolimits _{i=1}^{n_j } {\sum \nolimits _{r=1}^{n_h } {\vert y_{ji} -y_{hr} \vert } } ,\) for \(y_{ji}~<~y_{hr}\) and \(\bar{y}_j \ge \bar{y}_h \).

It is immediate to notice that \(T_{jh}\) increases as the difference \((\bar{y}_j -\bar{y}_h )\) is decreasing. Furthermore, for \(\bar{y}_j =\bar{y}_h \), \(T_{jh}\) reaches its maximum, that is

$$\begin{aligned} { }_{\max }T_{jh} =\frac{1}{2}\sum \limits _{i=1}^{n_j } {\sum \limits _{r=1}^{n_h } {\vert y_{ji} -y_{hr} \vert } } \quad . \end{aligned}$$

The intensity of transvariation is obtained as the ratio between the sum of overlapping differences \(\vert y_{ji} -y_{hr} \vert \) and its maximum; that is:

$$\begin{aligned} i(t_{jh} )={2T_{jh} } / {\sum \limits _{i=1}^{n_j } {\sum \limits _{r=1}^{n_h } {\vert y_{ji} -y_{hr} \vert } } }. \end{aligned}$$
(2)

Intensity of transvariation ranges between 0, when no difference is overlapping, and 1, when the two subgroups completely overlap.

By jointly using intensity and probability of transvariation it is possible to obtain a wide information set about the relevance and the extent of the overlapping component.

3 Overlapping and the Gini index decomposition

The role of the overlapping component is analyzed within the framework of the Gini index which, in a population disaggregated into k subgroups, can be expressed as

$$\begin{aligned} G=\frac{1}{2n^2\bar{y}}\sum \limits _{j=1}^k {\sum \limits _{h=1}^k {\sum \limits _{i=1}^{n_j } {\sum \limits _{r=1}^{n_h } {\vert y_{ji} -y_{hr} \vert } } } } \end{aligned}$$
(3)

where \(\bar{y}\) is the arithmetic mean of y in the overall population.

In the extant literature on the Gini index decomposition, the component of inequality within subgroups is generally measured as

$$\begin{aligned} G_w =\sum \limits _{j=1}^k {G_{jj} p_j s_j } \end{aligned}$$
(4)

where \(G_{jj}\) is the Gini index in the j-th subgroup, \(p_{j} = n_{j}/n\) is the share of subgroup j in the overall population and \(s_j =(\sum \nolimits _{i=1}^{n_j } {y_i n_i } /\sum \nolimits _{i=1}^n {y_i n_i } )\) is the share of subgroup j in the overall y.

The measurement of inequality within subgroups based on (4) is not exempt from criticism [12]. However, in the following the topic of inequality within will not addressed. Conversely, we will focus on the problem which mainly motivates the controversial and articulate debate on the Gini index decomposition: the measurement of the inequality between subgroups.

3.1 Bhattacharia and Mahalanobis’s Gini index decomposition

Bhattacharia and Mahalanobis’s [1] seminal proposal for the measurement of inequality between \(G_b \), given by

starts a tradition in which differences between subgroups are measured on the basis of the difference between subgroups means.

Following the contribution by Bhattacharia and Mahalanobis, many authors (see [15], and references therein) suggested further measures for evaluating the inequality between subgroups. In all cases, however, the core of each expression is still represented by the difference between subgroups means, as in Bhatthacharia and Mahalanobis.

This approach to the measurement of \(G_b \) does not present particular disadvantages when the subgroups are not overlapping, but, in the general case of overlapping units, it leads to the presence of a third term, \(_{BM}G_{t}\), which acts as a “residual” in the Gini index decomposition:

$$\begin{aligned} G=G_{w}+_{BM}G_{b}+_{BM}G_{t} \end{aligned}$$
(5)

The minimum value of \(_{BM}G_{b}\) is 0 and it occurs when \(\bar{y}_j =\bar{y}{\begin{array}{*{20}l}&{\forall j} \end{array} }\). The component \(_{BM}G_{t}\) ranges between 0, when the subgroups are not overlapping, and \((G-G_w )\), when \(\bar{y}_j =\bar{y}{\begin{array}{*{20}l} &{} {\forall j} \\ \end{array} }\) and \(_{BM}G_{b}\) = 0.

By using (5), Shorrocks [19] classifies the Gini index as a non-additively decomposable measure, thus shifting the preferences of many researchers toward generalized entropy indexes, which are additively decomposable.

3.2 Yitzhaki and Lerman’s Gini index decomposition

The topic of overlapping subgroups in the Gini index decomposition is addressed in 1991 by Yitzhaki and Lerman [21], who extensively contributed to the development of the studies about the Gini index and its decomposition. In 1991 they introduce a specific overlapping index, \(O_{jh}\), which measures the degree according to which the distribution of the j-th subgroup is included in the range of the distribution of the h-th subgroup.

\(O_{jh}\) is defined as

$$\begin{aligned} O_{jh} =cov_h (y,F_j (y))/cov_h (y,F_h (y)) \end{aligned}$$

where \(F_{j}(y)\) is the cumulative distribution, which, in the sample, is estimated by the rank of the observation. The denominator of \(O_{jh}\) is the covariance between incomes of subgroup h and their rank, while the numerator of \(O_{jh}\) is the covariance between incomes of subgroup h and their rank, as belonging to subgroup j.

Following Yitzhaki and Lerman [21] and Yitzhaki [20], the inequality between subgroups is measured as

$$\begin{aligned} { }_{YL}G_b =2cov(\bar{y}_j ,\bar{F}_j )/\bar{y} \end{aligned}$$

where \(\bar{F}_j =\bar{R}_j /n\) and \(\bar{R}_j \) is the average rank of group j in the overall population.

In summary, the Gini index decomposition proposed by Yitzhaki and Lerman is

$$\begin{aligned} G=G_w +{ }_{YL}G_b +\sum \limits _{h=1}^k {s_h G_h } \sum \limits _{j=1,j\ne h}^k {p_j O_{jh} } =G_w +{ }_{YL}G_b +{ }_{YL}G_t \end{aligned}$$
(6)

The minimum value of \(_{YL}G_{b}\) is 0 and it occurs when \(\bar{y}_j =\bar{y} \quad {\forall j}\); the component \(_{YL}G_{t}\) ranges between 0, when there is no overlapping, and \((G-G_w )\), when \(\bar{y}_j =\bar{y} \quad {\forall j} \) and \(_{YL}G_{b}\) = 0.

The approach by Yitzhaki and Lerman is further developed by Frick et al. [11], who obtained new results on the Gini index decomposition and overlapping measurement by providing an interesting interpretation of overlapping in terms of the inverse of stratification.

3.3 Dagum’s Gini index decomposition

A second contribution to the analysis of overlapping subgroups in the Gini index decomposition is proposed by Dagum [8, 9], who extends a previous study by Mehran [18]. In Dagum’s proposal, both the inequality between subgroups and the contribution of overlapping units can be evaluated on the basis of two quantities: the Gini index between subgroups j and h, \(G_{jh}\), and the economic relative distance \(D_{jh}\). First, starting from the Gini index for the subgroup j, \(G_{jj}\),

$$\begin{aligned} G_{jj} =\frac{1}{2n_j^2 \bar{y}_j }\sum \limits _{i=1}^{n_j } {\sum \limits _{r=1}^{n_j } {\vert y_{ji} -y_{jr} \vert } } , \end{aligned}$$

the Gini index between subgroup j and h, \(G_{jh}\), can be simply expressed as

$$\begin{aligned} G_{jh} =\frac{1}{n_j n_h (\bar{y}_j +\bar{y}_h )}\sum \limits _{i=1}^{n_j } {\sum \limits _{r=1}^{n_h } {\vert y_{ji} -y_{hr} \vert } } . \end{aligned}$$

Second, the economic relative distance \(D_{jh}\) [5, 7] is

$$\begin{aligned} D_{jh} =(d_{jh} -p_{jh} )/(d_{jh} +p_{jh} ) \end{aligned}$$

where \(d_{jh}\) and \(p_{jh}\) are, respectively, the gross economic affluence and the first order moment of transvariation between the j-th and the h-th subgroups. Given \(\bar{y}_j \ge \bar{y}_h \), the gross economic affluence between the j-th and the h-th subpgroups, \(d_{jh}\), is a weighted average of the differences \((y_{ji} -y_{hr} )\) for all \(y_{ji} >y_{hr}\). Following Dagum [5, 7], \(d_{jh}\) can be expressed as

$$\begin{aligned} d_{jh} =E_j (yF_h (y))+E_h (yF_j (y))-E_h (y) \end{aligned}$$

where the subscripts indicate the subgroups j and h, F(y) is the cumulative distribution function and E stands for the mathematical expectation operator. Furthermore, the first-order moment of transvariation between the j-th and the h-th subgroups, \(p_{jh}\), is a weighted average of the differences \((y_{ji} -y_{hr} )\) for all \(y_{ji} <y_{hr} \). Following Dagum [5, 7], \(p_{jh}\) can be expressed as

$$\begin{aligned} p_{jh} =E_j (yF_h (y))+E_h (yF_j (y))-E_j (y). \end{aligned}$$

In this framework, inequality between subgroups is measured as

$$\begin{aligned} { }_DG_b =\sum \limits _{j=1}^k {\sum \limits _{h=1,j\ne h}^k {G_{jh} D_{jh} p_j s_h } } \end{aligned}$$

while the contribution to total inequality given by overlapping units is evaluated by means of

$$\begin{aligned} { }_DG_t =\sum \limits _{j=1}^k {\sum \limits _{h=1,j\ne h}^k {G_{jh} (1-D_{jh} )p_j s_h } }. \end{aligned}$$

When \(\bar{y}_j =\bar{y} \quad {\forall j} \), the differences \((y_{ji} -y_{hr} )\) for all \(y_{ji} >y_{hr} \) equal the differences \((y_{ji} -y_{hr} )\) for all \(y_{ji} <y_{hr} \), that is, when \(\bar{y}_j =\bar{y}\quad {\forall j} \), the component \(_{D}G_{b}\) equals the component \(_{D}G_{t}\). The minimum value of \(_{D}G_{b}\) is then (\(G -G_{w})\)/2 and the component \(_{D}G_{t}\) ranges between 0 and (\(G -G_{w})\)/2.

On the whole, Dagum’s Gini index decomposition is given by

$$\begin{aligned} G=G_w +{ }_DG_b +{ }_DG_t \end{aligned}$$
(7)

It is straightforward to note how both \(G_{jh}\) and \(D_{jh}\) involve an heavy computational effort. In order to overcome this problem a simplified version of Dagum’s contribution is developed by Costa [2, 3] as

$$\begin{aligned} { }_DG_b ={ }_DG_b^*+0.5(G-G_w -{ }_DG_b^*) \end{aligned}$$

and

$$\begin{aligned} { }_DG_t =0.5(G-G_w -{ }_DG_b^*) \end{aligned}$$

where

$$\begin{aligned} G_b^*=\sum \limits _{j=1}^{k-1} {\sum \limits _{h=j+1}^k {\frac{p_{hj}^*-s_{hj}^*}{p_{hj}^*s_{jh}^*+p_{jh}^*s_{hj}^*}(p_j s_h +p_h s_j )} } , \end{aligned}$$
$$\begin{aligned} p_{hj}^*=p_h /(p_h +p_j ), \end{aligned}$$
$$\begin{aligned} s_{hj}^*=s_h /(s_h +s_j ). \end{aligned}$$

It is also worth noting how the intuition behind Dagum’s Gini index decomposition is both extremely simple and appealing: each difference \(\vert y_{ji} -y_{hr} \vert \) in expression (3) is attributed to inequality within subgroups for \(j=h\), to inequality between subgroups for \(j\ne h\), \(\bar{y}_j \ge \bar{y}_h \) and \(y_{ji} \ge y_{hr} \), and to transvariation for \(j\ne h\), \(\bar{y}_j \ge \bar{y}_h \) and \(y_{ji} <y_{hr} \).

3.4 The comparison among Gini index decompositions

The main difference among Gini index decompositions is the measurement of inequality between \(G_{b}\) and it can be illustrated by analyzing the case of perfectly overlapping subgroups.

Following Bhattacharia and Mahalanobis [1], and Yitzhaki and Lerman [21], complete overlapping implies a null inequality between, \(G_{b}=0\), while all differences among subgroups distributions are evaluated by means of the transvariation component,

\(G_{t}=G-G_{w}\).

Conversely, within Dagum’s approach, a complete overlapping leads to equally assign (\(G-G_{w})\) to inequality between \(G_{b}\) and to transvariation component \(G_{t}\):

\(G_{t}=G_{b}=(G-G_{w})/2\).

It is possible to identify a direct relationship between the decompositions introduced by Dagum and by Bhattacharia-Mahalanobis. The link between the two proposals is given by the measurement of the inequality between subgroups: in particular, \(_{BM}G_{b}\) is obtained as the sum of all differences (\(y_{ji} - y_{hr}), \forall {j, h, i, r}\), with \(\bar{y}_j >\bar{y}_h \), while \(_{D}G_{b}\) corresponds to the sum of all differences \(\vert y_{ji} - y_{hr}\) \(\vert , \forall { j, h, i, r}\), with \(\bar{y}_j >\bar{y}_h \). If differences (\(y_{ji} - y_{hr})\) are non negative—that is, if there is no transvariation—\(_{BM}G_{b} = {_{D}G_{b}}\), while, in the case of overlapping subgroups, \(_{BM}G_{b} < {_{D}G_{b}}\) and (\(_DG_{b} - {_{BM}}G_{b}\)) is given by the negative differences (\(y_{ji} - y_{hr})\), that is by \(_{D}G_{t}\). It follows that \(_{D}G_{b}= _{BM}G_{b}+_{D}G_{t}\), which implies \(_{D}G_{t} = 0.5 _{BM}G_{t}\).

Given the characteristics of the different decompositions with respect to the overlapping component measurement, the approaches by Bhattacharya and Mahalanobis [1] and by Yitzhaki and Lerman [21] could be more useful in an exploratory context, where we need to assess the existence of inequality factors. In contrast, Dagum’s [8] proposal could be more adequate in a confirmative framework, where the inequality factors are already known but it is necessary to assess their contribution to total inequality.

In order to evaluate the behaviour of the various Gini index decompositions with respect to the presence of overlapping we propose a two step procedure. First, we normalize the transvariation components \(_{BM}G_{t}\), \(_{YL}G_{t}\) and \(_{D}G_{t}\) by dividing each quantity by its maximum, thus obtaining

$$\begin{aligned} _{BM}G_{t}^{*}&= _{BM}G_{t}/(G -- G_{w}),\\ _{YL}G_{t}^{*}&= _{YL}G_{t}/(G -- G_{w}),\\ _{D}G_{t}^{*}&= 2 _{D}G_{t}/(G -- G_{w}). \end{aligned}$$

Second, we compare the normalized transvariation components to the probability and the intensity of transvariation by calculating the absolute errors \(e_{p(t)}\) and \(e_{i(t)}\) where

$$\begin{aligned} e_{p(t)} = {\vert G}{t}^{*}{ - p(t) \vert } \end{aligned}$$
(8)

and

$$\begin{aligned} e_{i(t)} = {\vert G}_{t}^{*}{ - i(t) \vert } \end{aligned}$$
(9)

By means of expressions (8) and (9) we are able to evaluate the goodness of fit of the various decompositions with respect to overlapping component. High values of \(e_{p(t)}\) and \(e_{i(t)}\) indicate the existence of a relevant distance between \(G_{t}^{*}\) and the measure of transvariation, thus suggesting that the decomposition fails to correctly evaluate the overlapping component. In contrast, low values of \(e_{p(t)}\) and \(e_{i(t)}\) point towards a good performance of the decomposition in the measurement of the overlapping component.

Within the analysis of (8) and (9), it is also relevant to observe how the result \(_{D}G_{t}= 0.5 _{BM}G_{t}\) implies \(_{D}G_{t}^{*}\) \(=\) \(_{BM}G_{t}^{*}\) and, therefore, \(_{D}e_{p(t)} = \,_{BM}e_{p(t)}\) and \(_{D}e_{i(t)} = \, _{BM}e_{i(t)}\).

In the following we develop a simulation study which provides m replications of the Gini index decomposition. In this case, starting from \(e_{p(t)}\) and \(e_{i(t)}\), it is possible to obtain the mean absolute errors \(E_{p(t)}\) and \(E_{i(t)}\) as

$$\begin{aligned}&\displaystyle E_{p(t)} =\frac{1}{m}\sum \limits _{j=1}^m {\vert G_{tj}^*-p(t)_j \vert }\quad and \end{aligned}$$
(10)
$$\begin{aligned}&\displaystyle E_{i(t)} =\frac{1}{m}\sum \limits _{j=1}^m {\vert G_{tj}^*-i(t)_j \vert }. \end{aligned}$$
(11)

Furthermore, the comparison of the Gini index decompositions is performed also on the basis of the ratios \(_{BM}G_{t}/G\), \(_{YL}G_{t}/G\), and \(_{D}G_{t}/G\), which are analyzed with respect to the ratio \(G_{w}/G\), and to the probability and the intensity of transvariation.

4 The simulation study

The aim of the Monte Carlo study developed in this section is to evaluate and compare the alternative Gini index decompositions with respect to the presence of transvariation.

Both probability and intensity of transvariation, together with three different Gini index decompositions [(expressions (5) by Bhattacharya and Mahalanobis, (6) by Yitzhaki and Lerman and (7) by Dagum], are computed on simulated samples, which are randomly extracted from beta or gamma distribution with the purpose to cover a wide range of different situations.

We consider the cases of \(k = 2, 3, 4\) for the number of subgroups and \(n = 100, 500, 1000, 5000\) and 10,000 for the sample size. Subgroup’s size \(n_{j}, j = 1, {\ldots }, k\), is randomly selected with the constraints \(\sum \nolimits _{j=1}^k {n_j =n} \) and \(n_j \ge 50\). Only for sample size \(n=100\), the minimum value of \(n_{j}\) is set to 10.

For each subgroup, beta distribution parameters B(ab) are randomly selected with 0.5 \(\le ~a~\le \) 2 and 0.5 \(\le ~b\le \) 4, while gamma distribution parameters \(\Gamma (c,d)\) are randomly selected with 0.5 \(\le ~c~\le \) 10 and 0.5 \(\le ~d~\le \) 10.

For each combination of k and n, 20.000 samples are randomly generated, 50 % from the beta distribution and 50 % from the gamma distribution, for a total of \(m=300\).000 samples.

Table 1 reports the frequency distribution of the probability of transvariation by number k of subgroups. Each row illustrates the results related to 50000 samples. For example, from the first row it is possible to observe how, within the 50000 samples generated from the beta distribution and divided in \(k=2\) subgroups, we obtain a probability of transvariation lesser than 0.2 only in the 5 % of cases, while a probability of transvariation greater than 0.8 occurs in the 27 % of cases. Results are summarized for the 5 sample sizes used in the simulation study (n = 100, 500, 1000, 5000 and 10.000), since different values of n lead to the same classification of the simulated samples by both probability and intensity of transvariation.

Table 1 Frequency distribution of the probability of transvariation in the simulated samples by number k of subgroups
Table 2 Frequency distribution of the intensity of transvariation in the simulated samples by number k of subgroups

It is possible to observe how the simulated samples extracted from gamma distribution are concentrated on the lower values of probability of transvariation, while in the simulated samples extracted from beta distribution medium and high values of probability of transvariation are more frequent. Furthermore, as expected, by increasing the number of subgroups k, frequencies shift towards higher levels of transvariation.

The frequency distribution of the intensity of transvariation in the simulated samples by number k of subgroups is reported on Table 2, where it is possible to find a pattern only slightly different from Table 1.

As expected, probability and intensity of transvariation have a similar effect on the distribution of the simulated samples.

Results related to the Gini index decompositions are summarized on Table 3 for probability of transvariation and on Table 4 for intensity of transvariation by means of the ratios \(G_{w}/G, _{BM}G_{t}/G, _{YL}G_{t}/G\) and \(_{D}G_{t}/G\). Each row refers to 100,000 samples divided in k subgroups.

Table 3 Decomposition by Bhattacharya and Mahalanobis, Yitzhaki and Lerman, Dagum and by probability of transvariation

The ratio \(G_{w}\)/G evaluates the weight of inequality within in total inequality: for example, for \(k=2\), \(G_{w}\)/G starts from 40 % when overlapping is weak, and it increases to over 60 % for high levels of overlapping. The number k of subgroups strongly influence \(G_{w}\)/G, which decreases by increasing k.

The ratios \(_{BM}G_{t}/G, _{YL}G_{t}/G\) and \(_{D}G_{t}/G\) illustrate the dynamic of transvariation component by increasing levels of overlapping. In the first column of Table 3, when overlapping is absent or weak, all decompositions lead obviously to the same results, with \(_{BM}G_{t}\) \(\approx ~_{YL}G_{t}~\approx ~_{D}G_{t}~\approx \) 0. The main difference between the decomposition can be clearly observed in the last column of Table 3: when overlapping is complete (or almost complete), we have, for Bhattacharya-Mahalanobis and Yitzhaki-Lerman, \(G_{w}\)/\(G+_{BM}G_{t}/G \approx ~G_{w}\)/\(G+_{YL}G_{t}/G \approx \) 1, while, for Dagum, it holds \(G_{w}\)/G+2*\(_{D}G_{t}/G \approx \)1. From Table 3, by comparing \(_{BM}G_{t}/G \)and \(_{D}G_{t}/G\), it is also possible to find the result \(_{D}G_{t} = 0.5\) \(_{BM}G_{t}\) derived in paragraph 3.4. Furthermore, it can be observed how \(_{BM}G_{t}/G~<~_{YL}G_{t}/G\), and, given \(_{D}G_{t}\) = 0.5 \(_{BM}G_{t}\), we have \(_{D}G_{t}\)/\(G~<\) \(_{BM}G_{t}/G~<~_{YL}G_{t}/G\).

Table 4 Decomposition by Bhattacharya and Mahalanobis, Yitzhaki and Lerman, Dagum and by intensity of transvariation

The analysis of the ratios \(G_{w}/G, _{BM}G_{t}/G, _{YL}G_{t}/G\) and \(_{D}G_{t}/G\) by intensity of transvariation (Table 4) leads to different values but identical conclusion with respect to the analysis performed by referring to probability of transvariation.

Furthermore, for both probability and intensity of transvariation, sample size n does not influence the ratios \(G_{w}/G, _{BM}G_{t}/G, _{YL}G_{t}/G\) and \(_{D}G_{t}/G\).

On the whole, within the Dagum’s approach the overlapping component plays a minor role both with respect to other inequalities components and in comparison to overlapping in other decompositions. On the contrary, within Yitzhaki and Lerman’s decomposition, overlapping component becomes a major source of total inequality, while in Bhattacharya and Mahalnobis’s approach it plays an intermediate role.

The comparison among alternative Gini index decompositions can be performed by means of the mean absolute errors \(E_{p(t)}\) and \(E_{i(t)}\), calculated as in (10) and (11), which are illustrated on Tables 5, 6 for probability and intensity of transvariation, respectively. Each row refers to 100,000 samples divided in k subgroups.

Table 5 Results of the simulation study: mean absolute error \(E_{p(t)}\) by probability of transvariation p(t) and number of subgroups k
Table 6 Results of the simulation study: mean absolute error \(E_{i(t)}\) by intensity of transvariation i(t) and number of subgroups k

The first and the last columns of Table 5 illustrate the results related to the two extreme situations: absence of overlapping and complete overlapping, respectively. In both cases we can observe low values of \(E_{p(t)}\), thus indicating a good performance of the different decompositions in the overlapping component measurement.

Furthermore, by comparing Bhattacharya and Mahalanobis’s decompositions to Dagum’s approach, we find the result \(_{D}e_{p(t)} = \,_{BM}e_{p(t)}\) demonstrated in Sect. 3.4.

Finally, it is interesting to note how, by referring to probability of transvariation, Yitzhaki and Lerman’s approach leads to the lowest values of the mean absolute error.

The analysis of the mean absolute error by intensity of transvariation (Table 6) validates the indications of \(E_{p(t)}\) related to the two extreme situations (first and last columns still shows really low values of \(E_{i(t)})\) and to the comparison between Bhattacharya-Mahalnobis’s and Dagum’s approaches (\(_{D}e_{i(t)} =\, _{BM}e_{i(t)})\).

However, in Table 6 it is also possible to observe a relevant difference with respect to Table 5: Yitzhaki and Lerman’s approach shows the highest values of \(E_{i(t)}\), thus inverting the previous result based on \(E_{p(t)}\).

Furthermore, the Gini index decompositions performed on the simulated data also indicates that samples extracted from beta distribution and samples extracted from gamma distribution show a similar pattern with respect to the role of the overlapping component. Given the strong diversity among sample distributions this result ensures an high level of robustness to our study.

Finally, a last result refers to the number of observations n, which seems to not influence the overlapping component within the Gini index decomposition: sample sizes from 100 to 10,000 lead to the same pattern.

5 Concluding remarks

This paper developed a simulation study aimed to analyze and compare different Gini index decompositions. Our focus is on the role of overlapping component, evaluated by means of both probability and intensity of transvariation.

The analysis of the characteristics of the different decompositions with respect to the overlapping component measurement leads to suggest that the approaches by Bhattacharya–Mahalanobis and Yitzhaki–Lerman could be more useful in an exploratory context, where we need to assess the existence of inequality factors, while Dagum’s proposal could be more adequate in a confirmative framework, where the inequality factors are already known, but it is necessary to assess their contribution to total inequality.

Results of the simulation study suggest that the number of observations does not influence the overlapping component, while the number of subgroups plays a key role. We also provide evidence about the role of inequality within in total inequality by increasing levels of overlapping and compare different methods for overlapping component measurement.

On the basis of probability of transvariation, that is by considering the frequency of overlapping, Yitzhaki and Lerman’s proposal leads to the lowest mean absolute error, thus indicating the best performance in the overlapping component measurement. On the contrary, by referring to intensity of transvariation, that is by considering the extent of overlapping, the Dagum’s approach allows to obtain better results into the measurement of \(G_{t}\).

Finally, the analysis of overlapping suggests appealing results also in the context of empirical studies, where it can be extremely helpful into the assessment of the “poverty” degree of the different units and into the investigation of the inequality structure.