1 Introduction

Decomposability of inequality measures into contributions of population subgroups and contributions of sources is a desirable property for studies of economic inequality status and trends in populations. In fact, several types of subgroup decomposition of the Gini inequality index have been proposed so far. However, these decompositions have disadvantages such as inconsistency and impracticality. In contrast, the new type of decomposition presented in this paper has good properties. It is notable that the new decomposition satisfies the completely identical distribution (CID) condition, whereby the between-group inequality is null if and only if the distribution within each subgroup is identical to all the others. This is in striking contrast to the well-known subgroup decomposition of the Theil index or the generalized entropy measures, which satisfy the condition whereby the between-group inequality is null if the mean within each subgroup equals those of all the others. It should also be noted that the new decomposition can be generalized to multivariate Gini indices while essentially maintaining its properties, indicating its suitability for the Gini decomposition.

Generalization of the Gini index to multivariate settings is a relatively new research issue, although the Gini index has been the most popular inequality measure for many years. Koshevoy and Mosler [11] proposed two types of multivariate Gini index, the distance-Gini index and the volume-Gini index. The former was formulated using an approach involving generalization of the univariate relative mean difference. The latter is a modification of the multivariate index proposed by Oja [14], which can be formulated with an approach using generalization of the Lorenz curve. Koshevoy and Mosler [11] showed that their indices are decomposable into subgroups in a similar way as the two-term decomposition of the ordinary univariate Gini index [see (14) in the next section] proposed by several researchers, including Rao [16] and Dagum [4]. However, their decomposition has disadvantages in terms of practicality and consistency. The new decomposition in this paper can easily be generalized based on studies of the multivariate Cramér test [2] in the case of the distance-Gini index. The generalization can also be achieved in the case of the volume-Gini index based on the Brunn–Minkowski inequality or Minkowski’s first inequality concerning mixed volume, with some modifications of the index definition. The CID condition needs to be loosened somewhat in the latter case. The source decomposition of Rao [16] can also be generalized to both multivariate indices. It is notable that interaction terms appear among sources of different attributes in the source decomposition of the modified volume-Gini index.

The paper is organized as follows. The next section is devoted to subgroup decomposition of the usual univariate Gini index. Section 2.1 introduces the new type of subgroup decomposition, which is extended to the multilevel decomposition in section 2.2, and compared with other types of subgroup decomposition previously proposed in section 2.3. In section 2.4, the new decomposition is applied to Japanese household income data. The results for age-group decomposition and regional decomposition are presented. Section 3 is devoted to subgroup and source decomposition of the multivariate Gini indices. The new subgroup decomposition is generalized to the distance-Gini index in section 3.1, and to the volume-Gini index in section 3.2, with modifications of the index definition. Source decomposition of both indices is introduced in section 3.3, followed by applications of the subgroup decomposition to Japanese household income and expenditure data in section 3.4. Section 4 concludes discussions, with some remarks concerning multivariate inequality measures.

2 Subgroup decomposition of the Gini index

2.1 New type of subgroup decomposition

Let F(y) represent the cumulative distribution function of a nonnegative random variable Y such as income, with a finite positive expectation μ. The Gini mean difference M(F) can be presented in several ways, as follows:

$$ M{\left( F \right)} = \frac{1} {2}{\int {{\int {{\left| {x - y} \right|}dF{\left( x \right)}dF} }{\left( y \right)}} } = {\int {F{\left( y \right)}{\left( {1 - F{\left( y \right)}} \right)}{\text{d}}y} } = {\int {{\left( {y - \mu } \right)}{\text{d}}{\left( {{\left( {F - 1} \right)}F} \right)}} } = 2{\int {{\left( {y - \mu } \right)}{\left( {F(y) - \frac{1} {2}} \right)}{\text{d}}F} }{\left( y \right)} $$
(1)

In the literature, double M(F) is often called the Gini mean difference; however, M(F) is defined as the Gini mean difference in this paper. Among the four equivalent representations in Eq. 1, the first is the original expression of the Gini mean difference, the fourth express it as the covariance between the variable Y and its rank F(Y) (see [13]). Strictly speaking, when taking non-continuous distributions into consideration, the fourth representation does not hold. The second representation expresses M(F) as the co-moment between the rank function F(y) and its reverse rank function 1−F(y). Since the integrand of the second representation \( F{\left( y \right)}{\left( {1 - F{\left( y \right)}} \right)} \) equals the expected variance of the binary variable “whether the random variable Y takes a value less than or equal to y”, the second representation is also interpretable as the total of the expected variance of the binary variable over various values of Y. The second representation can be proved using Lemma 2.1 of Baringhaus and Franz [2]. The third and fourth representations are derived from the second using integration by parts. As for the Gini relative mean difference, in other words, the Gini inequality index \( R{\left( F \right)} = {M{\left( F \right)}} \mathord{\left/ {\vphantom {{M{\left( F \right)}} \mu }} \right. \kern-\nulldelimiterspace} \mu \), the corresponding representations are obtained by division by μ in Eq. 1.

Assume that the population consists of groups 1,2,...,n. Let F i (y), μ i , and p i represent the cumulative distribution function, the expected value and the share of group i in the overall population, respectively. Note that \( F{\left( y \right)} = {\sum {p_{i} F_{i} {\left( y \right)}} } \). Then, using the second representations in Eq. 1, the Gini mean difference M(F) and the Gini index R(F) can be decomposed by subgroup, respectively, as follows:

$$ M{\left( F \right)} = {\sum {p_{i} M{\left( {F_{i} } \right)} + {\sum {p_{i} {\int {{\left( {F_{i} {\left( y \right)} - F{\left( y \right)}} \right)}^{2} {\text{d}}y} }} }} } $$
(2)
$$ R{\left( F \right)} = {\sum {p_{i} \frac{{\mu _{i} }} {\mu }R{\left( {F_{i} } \right)} + {\sum {p_{i} \frac{1} {\mu }{\int {{\left( {F_{i} {\left( y \right)} - F{\left( y \right)}} \right)}^{2} dy} }} }} }. $$
(3)

The proof is by direct calculation. The first term on the right-hand side of Eq. 3 corresponds to the within-group inequality, and the second term corresponds to the between-group inequality. The contribution of each group to the between-group inequality can be naturally defined as follows:

$$ p_{i} \frac{1} {\mu }{\int {{\left( {F_{i} {\left( y \right)} - F{\left( y \right)}} \right)}^{2} {\text{d}}y = p_{i} \frac{1} {\mu }cv{\left( {F_{i} ,F} \right)}{\left( { \geqslant 0} \right)}} }. $$
(4)

\( cv{\left( {G,F} \right)}: = {\int {{\left( {G{\left( y \right)} - F{\left( y \right)}} \right)}^{2} dy} } \) satisfies the following equality [2], which is useful for generalizing the decompositions 2 and 3 to multivariate settings, as shown in the next section:

$$ {\text{cv}}{\left( {G,F} \right)} = {\int {{\int {{\left| {x - y} \right|}{\text{d}}G(x){\text{d}}F(y)} }} } - \frac{1} {2}{\int {{\int {{\left| {x - y} \right|}{\text{d}}G{\left( x \right)}{\text{d}}G} }} }{\left( y \right)} - \frac{1} {2}{\int {{\int {{\left| {x - y} \right|}{\text{d}}F{\left( x \right)}{\text{d}}F{\left( y \right)}} }} }. $$
(5)

Equality 5 forms the basis of the Cramér two-sample test and its generalization to multivariate settings [2]. In this connection, I call the between-group inequality, the second term on the right-hand side of Eq. 3, the Cramér coefficient of variation among groups 1,...,n.

It may be felt that the second term should not be regarded as the between-group inequality because the functional form appears different to that of the Gini index. However, the second term has essentially the same form as the Gini index, because the Gini index holds the following equality, which can be regarded as a special case of Eq. 3 with the null within-group inequality:

$$ R{\left( F \right)} = {\int {{\text{d}}F{\left( x \right)}\frac{1} {\mu }{\int {{\left( {I_{{\left[ {x,\infty } \right)}} {\left( y \right)} - F{\left( y \right)}} \right)}^{2} {\text{d}}y} }} } = \frac{1} {\mu }{\int {cv{\left( {I_{{\left[ {x,\infty } \right)}} ,F} \right)}{\text{d}}F{\left( x \right)}} }, $$
(6)

where I [x, ∞)(y)=1 if y ≥ x, or 0 if y < x. Equality 6 expresses the notion that the Gini index is identical to the Cramér coefficient of variation if each population unit forms an individual group. Note that the indicator function I [x, ∞) in Eq. 6 corresponds to the one-point distribution function for a random variable, which takes value x almost surely. For proof note that \( {\int {{\left( {I_{{[x,\infty )}} {\left( y \right)} - F{\left( y \right)}} \right)}^{2} dF{\left( y \right)}} } = F{\left( y \right)}{\left( {1 - F{\left( y \right)}} \right)} \).

The following decompositions are also true:

$$ M{\left( F \right)} = {\sum {p_{i} M{\left( {F_{i} } \right)}} } + {\sum\limits_{i < j} {p_{i} p_{j} cv{\left( {F_{i} ,F_{j} } \right)}} } $$
(7)
$$ R{\left( F \right)} = {\sum {p_{i} \frac{{\mu _{i} }} {\mu }R{\left( {F_{i} } \right)}} } + {\sum\limits_{i < j} {p_{i} p_{j} \frac{1} {\mu }cv{\left( {F_{i} ,F_{j} } \right)}} }. $$
(8)

These decompositions can be proved by direct calculation. Decomposition 8 attributes the between-group inequality to the relative mean squared difference of distribution functions between each pair of groups. From Eqs. 5 and 1, the following equality is obtained for cv(F i ,F):

$$ cv{\left( {F_{i} ,F} \right)} = 4{\left( {M{\left( {\frac{1} {2}F_{i} + \frac{1} {2}F} \right)} - \frac{1} {2}M{\left( {F_{i} } \right)} - \frac{1} {2}M{\left( F \right)}} \right)}. $$
(9)

Thus, cv(F i ,F) is four-fold greater than the surplus of dispersion in terms of the Gini mean difference for the horizontal merger of group i and the overall population. Assuming the ɛ to 1−ɛ merger ratio, where ɛ is a small positive number, the following representation is also obtained for cv(F i ,F) by substituting ɛ, 1−ɛ, F i , F and \( \varepsilon F_{i} + {\left( {1 - \varepsilon } \right)}F \) for p 1, p 2, F 1, F 2 and F, respectively in Eq. 7:

$$ cv{\left( {F_{i} ,F} \right)} = \frac{{M{\left( {\varepsilon F_{i} + {\left( {1 - \varepsilon } \right)}F} \right)} - \varepsilon M{\left( {F_{i} } \right)} - {\left( {1 - \varepsilon } \right)}M{\left( F \right)}}} {\varepsilon } + o{\left( \varepsilon \right)}. $$
(10)

Thus, cv(F i , F) equals the surplus of the dispersion relative to the merger ratio when merger with an infinitely small merger ratio of group i takes place.

Obviously, the between-group inequality in decompositions 3 and 8 is null if and only if the distribution within each group is identical to those of all other groups. I call this condition the completely identical distribution (CID) condition. For this reason, the decomposition is quite different from that of the generalized entropy measures of inequality, in which the between-group inequality is null if and only if the mean within each group is equal to all the others. Bhattacharya and Mahalanobis [3] mentioned that, intuitively, it is reasonable to lay down the between-group component should not change if the group distributions F i are changed, keeping μ i fixed. However, Dagum [4] was opposed to taking the income means of subpopulations as their representative values to estimate the between-subpopulation inequality because income distributions significantly depart from normality. I believe that the new decomposition 3 favors Dagum’s view, although he pursued a different approach that added an extra component besides the between-group component of Bhattacharya and Mahalanobis, as shown in section 2.3.

2.2 Extension to multilevel decomposition

Let F ij (y), μ ij , and p ij represent the cumulative distribution function, the expected value and the share of subgroup j in group i, respectively. Noting that \(F_{i} {\left( y \right)} = {\sum\nolimits_j {p_{{ij}} F_{{ij}} {\left( y \right)}} }\), and \( \mu _{i} = {\sum\nolimits_j {p_{{ij}} \mu _{{ij}} } }\), the two-level decomposition of the Gini index R(F) is derived by further decomposing the Gini index within each group R(F i ) in Eq. 3 as follows:

$$ R{\left( F \right)} = {\sum\limits_{i,j} {p_{i} \frac{{\mu _{i} }} {\mu }p_{{ij}} \frac{{\mu _{{ij}} }} {{\mu _{i} }}R{\left( {F_{{ij}} } \right)}} } + {\sum\limits_{i,j} {p_{i} \frac{{\mu _{i} }} {\mu }p_{{ij}} \frac{1} {{\mu _{i} }}{\int {{\left( {F_{{ij}} {\left( y \right)} - F_{i} {\left( y \right)}} \right)}^{2} {\text{d}}y} }} } + {\sum\limits_i {p_{i} \frac{1} {\mu }{\int {{\left( {F_{i} {\left( y \right)} - F{\left( y \right)}} \right)}^{2} {\text{d}}y} }} }. $$
(11)

The second term on the right-hand side of Eq. 11 corresponds to the sum of the between-subgroup inequalities within groups. One of the advantages of decomposition 3 is that the between-group inequality is consistently defined with hierarchical grouping systemsFootnote 1, since the between-subgroup inequality in the overall population equals the sum of the between-subgroup inequalities within groups and the between-group inequality in the overall population, i.e. the following equality is true for each group:

$$ {\sum\limits_j {p_{{ij}} \frac{1} {\mu }{\int {{\left( {F_{{ij}} {\left( y \right)} - F{\left( y \right)}} \right)}^{2} {\text{d}}y} }} } = {\sum\limits_j {\frac{{\mu _{i} }} {\mu }p_{{ij}} \frac{1} {{\mu _{i} }}{\int {{\left( {F_{{ij}} {\left( y \right)} - F_{i} {\left( y \right)}} \right)}^{2} {\text{d}}y} }} } + \frac{1} {\mu }{\int {{\left( {F_{i} {\left( y \right)} - F{\left( y \right)}} \right)}^{2} {\text{d}}y} }. $$
(12)

Obviously, decomposition 11 can be further extended to thicker-layered decompositions, retaining the consistency.

2.3 Comparison with other types of subgroup decomposition

Several researchers, such as Pyatt [15] and Dagum [4], proposed the following three-term decomposition:

$$ R{\left( F \right)} = {\sum\limits_i {p^{2}_{i} \frac{{\mu _{i} }} {\mu }R{\left( {F_{i} } \right)}} } + \frac{2} {\mu }{\sum\limits_{i < j} {p_{i} p_{j} D_{{ij}} } } + \frac{1} {\mu }{\sum\limits_{i < j} {p_{i} p_{j} {\left| {\mu _{i} - \mu _{j} } \right|}} }, $$
(13)

where\( D_{{ij}} = {\int {{\text{d}}F_{j} {\left( x \right)}{\int_0^x {{\left( {x - y} \right)}{\text{d}}F_{i} } }{\left( y \right)}} }\;if{\text{ }}\mu _{i} > \mu _{j} ,{\text{ }}or{\int {{\text{d}}F_{i} {\left( x \right)}{\int_0^x {{\left( {x - y} \right)}{\text{d}}F_{j} } }{\left( y \right)}} }\;if{\text{ }}\mu _{i} < \mu _{j} \).The first term can be regarded as the contribution of the within-group inequality since the term is a weighted sum of the within-group inequality values. The third term on the right-hand side of (13) equals the between-group inequality defined by Bhattacharya and Mahalanobis [3]. The second term is regarded as the contribution of the trans-variation intensity, which measures a degree of overlap between the within-group distributions. This three-term decomposition is less satisfactory because it is inconsistent with multilevel groupings, and the weights assigned to the subgroups in the first term do not sum up to one.

The sum of the second and third terms is called the gross between-group Gini index, which, with the first term, comprises the following two-term decomposition:

$$ R{\left( F \right)} = {\sum\limits_i {p^{2}_{i} \frac{{\mu _{i} }} {\mu }R} }{\left( {F_{i} } \right)} + \frac{1} {\mu }{\sum\limits_{i < j} {p_{i} p_{j} {\int {{\int {{\left| {x - y} \right|}{\text{d}}F_{i} {\left( x \right)}{\text{d}}F_{j} } }} }} }{\left( y \right)}. $$
(14)

Although the two-term decomposition 14 can be extended to a multilevel decomposition consistently in a sense, the gross between-group Gini index cannot be regarded as the between-group inequality because it does not take the minimum constant value (usually normalized to null) when the within-group distributions or means are identical to each other.

Making use of the fourth representation of the Gini mean difference in Eq. 1, Yitzhaki and Lerman [24] proposed a different type of three-term decomposition, which yields a between-group inequality that is much closer to the counterpart of the new decomposition 3, as shown in section 2.4.

$$ R{\left( F \right)} = {\sum {p_{i} \frac{{\mu _{i} }} {\mu }R} }{\left( {F_{i} } \right)} + {\sum {p_{i} \frac{{\mu _{i} }} {\mu }R{\left( {F_{i} } \right)}{\left( {O_{{0i}} - 1} \right)}} } + \frac{2} {\mu }{\sum {p_{i} {\left( {\mu _{i} - \mu } \right)}{\left( {G_{i} - \frac{1} {2}} \right)}} }, $$
(15)

where \( O_{{0i}} = \frac{{{\int {{\left( {y - \mu _{i} } \right)}F{\left( y \right)}{\text{d}}F_{i} } }{\left( y \right)}}} {{{\int {{\left( {y - \mu _{i} } \right)}F_{i} {\left( y \right)}{\text{d}}F_{i} } }{\left( y \right)}}},{\text{ }}G_{i} = {\int {F{\left( y \right)}{\text{d}}F_{i} {\left( y \right)}} } \) In Eq. 15, O 0i measures the degree to which the overall distribution is included in the range of the within-group distribution i, and G i is the expected rank of observations belonging to group i if they are ranked according to the ranking of the overall population. The third term on the right-hand side of Eq. 15 is the covariance between the within-group means and the average ranks of the respective groups. Thus, the third term can be regarded as the between-group inequality, which vanishes if the within-group mean μ i equals that of all the others, or the average rank of each group G i equals that of all the others. Although their decomposition has disadvantages in that the between-group inequality may take a negative value and there is inconsistency with multilevel groupings, it notably takes a step towards the new decomposition presented in this paper, in that the between-group inequality is defined using more than a single type of aggregates.

The contribution of each group to the between-group inequality in decomposition 3 shows the following relation to the components in decomposition 15 if F i is continuous for any group:

$$ p_{i} \frac{1}{\mu }{\int {{\left( {F_{i} - F} \right)}^{2} {\text{d}}y} } = p_{i} \frac{{\mu _{i} }}{\mu }R{\left( {F_{i} } \right)}{\left( {{\text{O}}_{{0i}} - 1} \right)} + 2p_{i} \frac{1}{\mu }{\left( {\mu _{i} - \mu } \right)}{\left( {G_{i} - \frac{1}{2}} \right)} + p_{i} R{\left( F \right)}{\left( {{\text{O}}_{{i0}} - 1} \right)},$$
(16)

where \( {\text{O}}_{{i0}} = \frac{{{\int {{\left( {y - \mu } \right)}F_{i} {\left( y \right)}{\text{d}}F} }{\left( y \right)}}}{{{\int {{\left( {y - \mu } \right)}F{\left( y \right)}{\text{d}}F{\left( y \right)}} }}} \). The proof is given in the Appendix. Note that the third term on the right-hand side of Eq. 16 vanishes on summation.

2.4 Applications to Japanese family income data

2.4.1 Decomposition of income inequality into age groups for household heads

The most recent survey results drew attention to a sharp rise in income inequality among the young generation in Japan, although the overall inequality has not risen notably if age effects are excluded. Table 1 shows trends in income inequality within each age group for household heads measured by the Gini index and the squared coefficient of variance (SCV). These indices measure the annual income inequality among households with two or more members. The estimates of the indices are derived from the National Survey of Family Income and Expenditures, a large-scale family budget survey of approximately 50,000 households, conducted by the Statistics Bureau, Ministry of Internal Affairs and Communications every 5 years. The Gini indices are estimated from two-way tables of income class by age group of the household head using the composite Simpson’s rule for approximations of Lorenz domains in a similar manner to the official Gini estimates. SCVs are picked up from the existing statistical tables, so the estimates are virtually calculated from the micro data. There are 10 income classes, ranging from <2 million yen to ≥15 million yen. Such Gini estimates are empirically considered to be good approximations to those estimated from the micro data.

Table 1 Income inequality by age group for household head

As shown in Table 2, decomposition into age groups for household headsFootnote 2 reveals that the youngest group, with a household head <30 years old, did not contribute to the slight increase in overall inequality between 1999 and 2004, despite the sharp increase in the within-group inequality. The increase in relative income of the youngest group rather contributed to the decrease in the between-group inequality, which canceled out the positive contribution of the within-group inequality. Thus, the slight increase in overall inequality between 1999 and 2004 should be attributed to contributions of other age groups, which did not draw much attention.

Table 2 Decomposition of income inequality into age groups for household heads*

The between-group inequality and contribution of each age group measured by the Gini decomposition of Yitzhaki and Lerman [24] are approximately two-fold greater than their counterparts in the new decomposition with the same sign.

2.4.2 Regional inequality in income distribution

It has recently been speculated that the between-region inequality, in particular the gap between the metropolitan areas including Greater Tokyo and other areas, is increasing, as well as the between-household inequality. However, the actual trend may differ somewhat from this speculation according to the results derived from the new decomposition. Table 3 shows the recent trends in regional income inequality in Japan, measured by three types of the Gini decomposition and decomposition of the generalized entropy measures. The single-parameter entropy family is defined as follows:

$$ E_{c} {\left( F \right)} = {\int {\varphi _{c} {\left( {y \mathord{\left/ {\vphantom {y \mu }} \right. \kern-\nulldelimiterspace} \mu } \right)}{\text{d}}F} }{\left( y \right)}, $$
(17)

where \( \varphi _{c} {\left( {y \mathord{\left/ {\vphantom {y \mu }} \right. \kern-\nulldelimiterspace} \mu } \right)} = {\left( {{\left( {y/\mu } \right)}^{c} - 1} \right)}/c{\left( {c - 1} \right)} \) if c ≠ 0,1,\( \varphi _{1} {\left( {y \mathord{\left/ {\vphantom {y \mu }} \right. \kern-\nulldelimiterspace} \mu } \right)} = {\left( {y \mathord{\left/ {\vphantom {y \mu }} \right. \kern-\nulldelimiterspace} \mu } \right)}\log {\left( {y \mathord{\left/ {\vphantom {y \mu }} \right. \kern-\nulldelimiterspace} \mu } \right)} \), \( \varphi _{0} {\left( {y \mathord{\left/ {\vphantom {y \mu }} \right. \kern-\nulldelimiterspace} \mu } \right)} = \log {\left( {\mu \mathord{\left/ {\vphantom {\mu y}} \right. \kern-\nulldelimiterspace} y} \right)} \). The inequality indices drawn from the entropy family 17 satisfy the following subgroup decomposition:

$$ E_{c} {\left( F \right)} = {\sum {p_{i} {\left( {\frac{{\mu _{i} }} {\mu }} \right)}^{c} E_{c} } }{\left( {F_{i} } \right)} + {\sum {p_{i} \varphi _{c} {\left( {\frac{{\mu _{i} }} {\mu }} \right)}} }. $$
(18)

The first and second terms on the right-hand side of Eq. 18 correspond to the within- and between-group inequality, respectively. In Table 3, the between-region inequality indices in the case of c = 0,1 2 are denoted by E 0, E 1 and E 2, respectively. E 2 is equivalent to SCV.

Table 3 Regional income inequality

A two-level regional grouping is used for calculation of the regional inequality. The whole country consists of 47 prefectures. Each prefecture was subdivided into approximately 3,000 municipalities (cities, wards, towns and villages) before many municipality mergers took place in 2005. The municipalities in each prefecture were grouped for the National Survey of Family Income and Expenditures, based on the more detailed municipality grouping used for the Establishment and Enterprise Census. Each municipality group consists of neighboring municipalities within the same prefecture. These groups were determined by taking into consideration spheres of residential life or economic relations among municipalities. There are 274 municipality groups in total. Similar to the application in section 2.4.1, Gini indices and their breakdowns by the new decomposition are estimated from two-way tables of income class by regions using the composite Simpson’s rule for approximations of Lorenz domains.

The Gini decomposition of Bhattacharya and Mahalanobis [3] and decomposition of the generalized entropy measures indicate an increase in the between-prefecture inequality between 1999 and 2004 as well as an increase in the between-municipality-group inequality. In contrast, the new Gini decomposition indicates a continuance of the downtrend of the between-prefecture inequality, despite an upturn of the between-municipality-group inequality. The Gini decomposition of Yitzhaki and Lerman shows little change in the between-prefecture inequality in the same period. Thus, the new decomposition and the decomposition of Yitzhaki and Lerman imply that the regional inequality within prefectures should be an issue rather than the gaps among prefectures. If attaching greater importance to consistency with the measurement of between-household inequality usually made by the Gini index in Japan, the implication derived from the new Gini decomposition should be noted, and it deserves further investigation.

Shorrocks and Wan [19] pointed out that the Gini decomposition of Bhattacharya and Mahalanobis produces considerably greater shares for the between-group inequality in the overall inequality compared to the decompositions of other indices. However, the new Gini decomposition produces slightly smaller shares for the between-group inequality compared to the decompositions of other indices, as shown in Table 4. The Gini decomposition of Yitzhaki and Lerman produces slightly greater shares. Similar to the decomposition into age groups for household heads shown in section 2.4.1, the between-prefecture inequality derived from the Yitzhaki and Lerman decomposition is approximately two-fold greater than that of the new decomposition. However, for the between-municipality-group inequality, the relative difference is less than double. It seems intuitive to suppose that the more minutely the population is subdivided, the smaller is the relative difference becomes.

Table 4 Ratio of regional income inequality to the overall income inequality (1999)

3 Decomposition of the multivariate Gini index

3.1 Subgroup decomposition of the distance Gini index

In this section, the new type of subgroup decomposition for the Gini index is generalized to multivariate Gini indices. First, the corresponding decomposition of the distance-Gini index, a variation of the multivariate Gini index proposed by Koshevoy and Mosler [11], is introduced, applying the achievement of Baringhaus and Franz [2] concerning the multivariate Cramér test.

Let x={x k } and y={y k } be d-dimensional vectors, and let F(y) represent the distribution function of a d-variate random variable Y on the orthant \( R^{d}_{ + } \) with a finite positive expectation vector μ={μ k }. Koshevoy and Mosler [11] defined the distance-Gini mean difference M D(F) as follows:

$$ M_{{\text{D}}} {\left( F \right)} = \frac{1}{{2{\text{d}}}}\iint {{\left\| {x - y} \right\|}{\text{d}}F{\left( x \right)}{\text{d}}F{\left( y \right)},\;}where{\left\| {x - y} \right\|} = {\sqrt {{\sum\limits_{k = 1}^d {{\left( {x_{k} - y_{k} } \right)}^{2} } }} }.$$
(19)

Let x/μ be a vector {x k /μ k }, and \( \widetilde{F}{\left( y \right)} \) be the distribution function of a random variable Y/μ. Then Koshevoy and Mosler [11] defined the distance-Gini relative mean difference (the distance-Gini index) R D(F) as \( M_{D} {\left( {\widetilde{F}} \right)} \). In the univariate case (d = 1), the distance-Gini index is identical to the ordinary Gini index.

The Euclidean norm can be represented as follows (e.g. [9]; the proof is also given in a more general form in the Appendix):

$$ {\left\| x \right\|} = C_{d} {\int_{S^{{d - 1}} } {{\left| {{a}\ifmmode{'}\else$'$\fi \cdot x} \right|}{\text{d}}\upsilon } }{\left( a \right)}, $$
(20)

where υ is the uniform distribution on the unit sphere \( S^{{d - 1}} = {\left\{ {\left. {a \in R^{d} \;} \right|\;{\left\| a \right\|} = 1} \right\}} \), and \( C_{d} = {\Gamma {\left( {{{\left( {d + 1} \right)}} \mathord{\left/ {\vphantom {{{\left( {d + 1} \right)}} 2}} \right. \kern-\nulldelimiterspace} 2} \right)}} \mathord{\left/ {\vphantom {{\Gamma {\left( {{{\left( {d + 1} \right)}} \mathord{\left/ {\vphantom {{{\left( {d + 1} \right)}} 2}} \right. \kern-\nulldelimiterspace} 2} \right)}} {2\pi }}} \right. \kern-\nulldelimiterspace} {2\pi }^{{{{\left( {d - 1} \right)}} \mathord{\left/ {\vphantom {{{\left( {d - 1} \right)}} 2}} \right. \kern-\nulldelimiterspace} 2}} \). Equation 20 allows the distance-Gini mean difference to be represented as follows [12]:

$$ M_{{\text{D}}} {\left( F \right)} = \frac{{C_{d} }}{{2{\text{d}}}}{\int_{S^{{d - 1}} } {{\text{d}}\upsilon {\left( a \right)}{\int_{ - \infty }^\infty {{\int_{ - \infty }^\infty {{\left| {u - v} \right|}{\text{d}}F{\left( {u,a} \right)}{\text{d}}F{\left( {v,a} \right)}} }} }} } = \frac{{C_{d} }}{{2{\text{d}}}}{\int_{S^{{d - 1}} } {{\text{d}}\upsilon {\left( a \right)}{\int_{ - \infty }^\infty {F{\left( {u,a} \right)}{\left( {1 - F{\left( {u,a} \right)}} \right)}{\text{d}}u} }} },$$
(21)

where F(·, a) denotes the distribution function of a·X, the projection of the random variable X on the line spanned by vector a. The corresponding representation for the distance-Gini index is obtained by substituting \( \widetilde{F}{\left( { \cdot ,a} \right)} \) for F(·, a), where \( \widetilde{F}{\left( { \cdot ,a} \right)} \) denotes the distribution function of a·X/μ. Similar to the second representation of the Gini mean difference in Eq. 1, the second representation of M D(F) in Eq. 21 can be interpreted as the total expected variance for whether or not a·X ≤ u occurs over the relative level u and the projection direction a.

Koshevoy and Mosler [12] mentioned that the distance-Gini index is decomposable into subgroup in a similar manner to the two-term decomposition 14. However, such decomposition has the same disadvantages as Eq. 14.

An extension of Eq. 5 to multivariate settings allows the (new) subgroup decomposition for the distance-Gini index. Let G(y) represents another d-variate distribution function. Baringhaus and Franz [2, Theorem 2.1) proved the following inequality, where the equality holds if and only if F = G:

$$ {\text{cv}}_{{\text{D}}} {\left( {G,F} \right)}: = \frac{1}{{\text{d}}}{\left( {\iint {{\left\| {x - y} \right\|}{\text{d}}G{\left( x \right)}{\text{d}}F{\left( y \right)}} - \frac{1}{2}\iint {{\left\| {x - y} \right\|}{\text{d}}G{\left( x \right)}{\text{d}}G{\left( y \right)}} - \frac{1}{2}\iint {{\left\| {x - y} \right\|}{\text{d}}F{\left( x \right)}{\text{d}}F{\left( y \right)}}} \right)} \geqslant 0. $$
(22)

Using Eqs. 4, 5 and 20, cvD(G, F) can be represented as follows:

$$ {\text{cv}}_{{\text{D}}} {\left( {G,F} \right)} = \frac{{C_{d} }}{{\text{d}}}{\int_{S^{{d - 1}} } {{\text{d}}\upsilon {\left( a \right)}{\int_{ - \infty }^\infty {{\left( {G{\left( {u,a} \right)} - F{\left( {u,a} \right)}} \right)}^{2} {\text{d}}u} }} }. $$
(23)

Assume the population consists of groups 1,2,...,n. Let F i (y), μ i , and p i represent the d-variate distribution function, the expectation vector and the share of group i in the overall population, respectively. Let \( \widetilde{F}_{i} {\left( y \right)} \) and \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} {\left( y \right)} \) be the distribution functions of Y/μ i  and Y/μ within group i, respectively. Then, the distance-Gini mean difference M D(F) and the distance-Gini index R D(F) can be decomposed as follows:

$$ M_{{\text{D}}} {\left( F \right)} = {\sum {p_{i} M_{{\text{D}}} {\left( {F_{i} } \right)}} } + {\sum {p_{i} {\text{cv}}_{{\text{D}}} {\left( {F_{i} ,F} \right)}} } $$
(24)
$$ R_{{\text{D}}} {\left( F \right)} = M_{{\text{D}}} {\left( {\widetilde{F}} \right)} = {\sum {p_{i} r_{{\text{D}}} {\left( {F_{i} } \right)}R_{{\text{D}}} {\left( {F_{i} } \right)}} } + {\sum {p_{i} {\text{cv}}_{{\text{D}}} {\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} ,\widetilde{F}} \right)}} },$$
(25)

where r D(F i )=0 if R D(F i )=0 or \( {M_{{\text{D}}} {\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} } \right)}} \mathord{\left/ {\vphantom {{M_{{\text{D}}} {\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} } \right)}} {M_{{\text{D}}} {\left( {\widetilde{F}_{i} } \right)}}}} \right. \kern-\nulldelimiterspace} {M_{{\text{D}}} {\left( {\widetilde{F}_{i} } \right)}} \) otherwise. The proof is given in the Appendix. r D(F i ) in Eq. 25 corresponds to the average relative level of group i. If μ i  = μ, then r D(F i )=1. However, if μ i  ≠ μ, r D(F i ) depends on the distribution F i unlike the univariate case. The second term on the right-hand side of Eq. 25 corresponds to the between-group inequality. It is null if and only if F i  = F for any group. That is, subgroup decomposition 25 satisfies the CID condition.

The distance-Gini mean difference and the distance-Gini index also have decompositions that correspond to Eqs. 7 and 8, respectively. The decomposition 8 is extended as follows:

$$ R_{{\text{D}}} {\left( F \right)} = {\sum {p_{i} r_{{\text{D}}} {\left( {F_{i} } \right)}R_{D} {\left( {F_{i} } \right)}} } + {\sum\limits_{i < j} {p_{i} p_{j} {\text{cv}}_{{\text{D}}} {\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{j} } \right)}} }. $$
(26)

The decomposition 26 can be proved in a similar manner to the proof of Eq. 25. cvD(F i ,F) holds Eq. 27, which correspond to Eq. 10.

$$ {\text{cv}}_{{\text{D}}} {\left( {F_{i} ,F} \right)} = \frac{{M_{{\text{D}}} {\left( {\varepsilon F_{i} + {\left( {1 - \varepsilon } \right)}F} \right)} - \varepsilon M_{{\text{D}}} {\left( {F_{i} } \right)} - {\left( {1 - \varepsilon } \right)}M_{{\text{D}}} {\left( F \right)}}}{\varepsilon } + o{\left( \varepsilon \right)}.$$
(27)

3.2 Subgroup decomposition of the modified volume-Gini index

3.2.1 Modified Torgersen index

Several types of multivariate Gini index proposed in the past can be defined based on the generalized Lorenz domain. For the introduction of such indices, let γ={γ i } be a non-negative d(≥2)-dimensional constant vector, and define the quantity M T(F|γ) as follows:

$$ M_{{\text{T}}} {\left( {F\left| \gamma \right.} \right)} = \frac{1}{{d!{\left( {1 + {\sum {\gamma _{i} } }} \right)}}}{\int { \cdots {\int {{\left| {\det {\left( {y_{1} - \gamma , \cdots ,y_{{\text{d}}} - \gamma } \right)}} \right|}{\text{d}}F} }{\left( {y_{1} } \right)} \cdots {\text{d}}F} }{\left( {y{}_{d}} \right)}. $$
(28)

Thus, a type of multivariate Gini index R T (F|γ) is defined as \( M_{{\text{T}}} {\left( {\widetilde{F}\left| \gamma \right.} \right)}\). R T(F|γ) ranges from zero to unity, as proved in the Appendix. Since R T(F|γ) equals the multivariate Gini index of Torgersen [21] if γ = 0, R T(F|γ) is called the modified Torgersen index. M T(F|γ), which corresponds to the Gini mean difference, is hereafter called the modified Torgersen mean difference or mean volume. As explained later, R T(F|1), where 1={1,...,1}, is preferable to the original Torgersen index R T(F|0). Similarly, M T(F|μ) is preferable to M T(F|0). Oja [14] proposed a different generalization, as follows:

$$\begin{aligned} & M_{{\text{O}}} {\left( F \right)} = \sigma _{1} {\left( F \right)} = \frac{1}{{{\left( {d + 1} \right)}!}}{\int { \cdots {\int {{\left| {\det {\left( {\begin{array}{*{20}c} {1} \\ {{{\mathbf{y}}_{1} }} \\ \end{array} , \cdots ,\begin{array}{*{20}c} {1} \\ {{{\mathbf{y}}_{{d + 1}} }} \\ \end{array} } \right)}} \right|}{\text{d}}F} }{\left( {{\mathbf{y}}_{1} } \right)} \cdots {\text{d}}F} }{\left( {{\mathbf{y}}{}_{{d + 1}}} \right)} = \frac{1}{{d + 1}}{\int {{\left( {1 + {\sum {\gamma _{i} } }} \right)}M_{T} {\left( {F\left| \gamma \right.} \right)}} }{\text{d}}F{\left( \gamma \right)}. \\ & \\ \end{aligned}$$
(29)

\( R_{O} {\left( F \right)} = M_{O} {\left( {\widetilde{F}} \right)} \). If d = 1, M O(F) is equivalent to the univariate Gini mean difference, and R O(F) is equivalent to the ordinary Gini index. M O(F) was introduced as a variation of the generalized variance of Wilks [23], which can be presented as follows:

$$\sigma ^{2}_{2} {\left( F \right)} = \frac{1}{{{\left( {d + 1} \right)}!}}{\int { \cdots {\int {{\left| {\det {\left( {\begin{array}{*{20}c} {1} \\ {{y_{1} }} \\ \end{array} , \cdots ,\begin{array}{*{20}c} {1} \\ {{y_{{d + 1}} }} \\ \end{array} } \right)}} \right|}^{2} dF(y_{1} )} } \cdots dF{\left( {y{}_{{d + 1}}} \right)}} } = \frac{1}{{d!}}{\int { \cdots {\int {{\left| {\det {\left( {y_{1} - {\mathbf{\mu }}, \cdots ,y_{d} - {\mathbf{\mu }}} \right)}} \right|}^{2} {\text{d}}F{\left( {y_{1} } \right)}} } \cdots {\text{d}}F{\left( {y{}_{d}} \right)}} } = \det {\left( {{\int {{\left( {y - {\mathbf{\mu }}} \right)}{\left( {y - {\mathbf{\mu }}} \right)}^{T} {\text{d}}F{\left( y \right)}} }} \right)}.$$
(30)

The third representation (rightmost side) in Eq. 30 expresses the notion that the generalized variance equals the determinant of the variance–covariance matrix of distribution F. M O(F) and M T(F| μ) can be regarded as counterparts of the first and second representations of the generalized variance, respectively.

3.2.2 Multidimensional Lorenz domain

The modified Torgersen index and the Oja index can be defined based on the volume of the Lorenz zonoids, which is a generalization of the Lorenz domain, introduced by Koshevoy and Mosler [10]. The relation between the indices and the Lorenz zonoids was studied by Koshevoy and Mosler [11]. This relation is utilized to derive the subgroup decomposition of the indices in this paper. For a measurable function:\( \phi :R^{d}_{ + } \to {\left[ {0,1} \right]} \), consider a d-dimensional vector z(φ,F|γ), where

$$z{\left( {\phi ,F\left| {\mathbf{\gamma }} \right.} \right)} = {\int {{\left( {{\mathbf{y}} - {\mathbf{\gamma }}} \right)}\phi {\left( {\mathbf{y}} \right)}{\text{d}}F} }{\left( {\mathbf{y}} \right)}$$
(31)

The following set Z T(F|γ), consisting of all z(φ,F|γ), is called the γ-zonoid of the distribution F.

$$ Z_{{\text{T}}} {\left( {F\left| \gamma \right.} \right)} = {\left\{ {\left. {z{\left( {\phi ,F\left| \gamma \right.} \right)}} \right|\phi :R^{d}_{ + } \to {\left[ {0,1} \right]}} \right\}}. $$
(32)

\( Z_{T} {\left( {\widetilde{F}\left| \gamma \right.} \right)} \), the γ-zonoid of \( \ifmmode\expandafter\tilde\else\expandafter\~\fi{F} \) that corresponds to the distribution function of Y/μ, is called the γ−Lorenz zonoid of F. Z O(F), the lift zonoid of F, is defined in the d+1 dimensional space \( {\left[ {0,1} \right]} \times R^{d}_{ + } \) as follows:

$$ Z_{{\text{O}}} {\left( F \right)} = {\left\{ {\left. {{\left( {p{\left( {\phi ,F} \right)},{\text{ }}z{\left( {\phi ,F\left| 0 \right.} \right)}} \right)}} \right|\phi :R^{d}_{ + } \to {\left[ {0,1} \right]}} \right\}},{\text{ where }}p{\left( {\phi ,F} \right)} = {\int {\phi {\left( y \right)}{\text{d}}F} }{\left( y \right)}. $$
(33)

\( Z_{{\text{O}}} {\left( {\widetilde{F}} \right)} \) is called the Lorenz zonoid of F. The γ-zonoids and lift zonoids belong to the family of the convex bodies – i.e. nonempty, compact and convex subsets of R d (regardless of whether they contain interior points or not). Z O(F) and \( Z_{{\text{O}}} {\left( {\widetilde{F}} \right)} \) are projected onto Z T(F|γ) and \( Z_{{\text{T}}} {\left( {\widetilde{F}\left| \gamma \right.} \right)} \), respectively, by the following linear transformation from \( {\left[ {0,1} \right]} \times R^{d}_{ + } \) onto \( R^{d}_{ + } \):

$$ {\left( {p,z} \right)} \to z - p\gamma . $$
(34)

In the univariate case (d = 1), the Lorenz zonoid has the shape shown in Fig. 1. The boundary of the univariate Lorenz zonoid consists of the Lorenz curve and the inverse Lorenz curve, which is equivalent to the Lorenz curve if rotated on the center point (1/2, 1/2) at an angle of 180°.

Fig. 1
figure 1

Illustration of the Lorenz zonoid in the univariate case

The volumes of Z T(F|γ) and \( Z_{T} {\left( {\widetilde{F}|\gamma } \right)} \) multiplied by the reciprocal of 1 + Σγ i equal the modified Torgersen mean difference M T(F|γ) and the modified Torgersen index R T(F|γ), respectively. Similarly, the volumes of Z O(F) and \( Z_{O} {\left( {\widetilde{F}} \right)} \) equal the Oja mean difference M O(F) and the Oja index R O(F), respectively. The relation between the Lorenz zonoid and the Oja index was proved by Koshevoy and Mosler [11, Theorem 5.1). The relation between the γ-Lorenz zonoid and the modified Torgersen index can also be proved along the same lines. In the case of finite-point distributions, it is essentially the relation between the Minkowski sum of line segments and its volume (e.g. [20]). Koshevoy and Mosler generalized this using the existence of a sequence of finite-point distributions, which converges weakly to any distribution.

3.2.3 Subgroup decomposition of the modified Torgersen index

To introduce the subgroup decomposition of the modified Torgersen index, we define the mixed volume of Z T(F|γ) and Z T(G|γ) with d−1 repetitions of Z T(F|γ) as follows:

$$ MV_{{d - 1}} {\left( {F,G\left| \gamma \right.} \right)} = \frac{1} {{d!}}{\int { \cdots {\int {{\left| {\det {\left( {y_{1} - \gamma , \cdots ,y_{d} - \gamma } \right)}} \right|}{\text{d}}F{\left( {y_{1} } \right)} \cdots {\text{d}}F{\left( {y_{{d - 1}} } \right)}{\text{d}}G} }} }{\left( {y_{d} } \right)}. $$
(35)

Definition 35 is equivalent to the following ordinary definition (e.g. [8]):

$$ MV_{{d - 1}} {\left( {F,G\left| \gamma \right.} \right)} = {\mathop {\lim }\limits_{\varepsilon \to + 0} }\frac{{{\text{vol}}{\left( {Z_{{\text{T}}} (F\left| \gamma \right.) + \varepsilon Z_{{\text{T}}} (G\left| \gamma \right.)} \right)} - vol{\left( {Z_{{\text{T}}} {\left( {F\left| \gamma \right.} \right)}} \right)}}} {{\varepsilon d}}, $$
(36)

where \( Z_{{\text{T}}} {\left( {F\left| \gamma \right.} \right)} + \varepsilon Z_{{\text{T}}} {\left( {G\left| \gamma \right.} \right)} = {\left\{ {x + \varepsilon y{\text{ }}\left| {x \in {\text{ }}Z_{{\text{T}}} {\left( {F\left| \gamma \right.} \right)}} \right.,{\text{ }}y \in {\text{ }}Z_{{\text{T}}} {\left( {G\left| \gamma \right.} \right)}} \right\}} \) is the Minkowski sum of Z T(F|γ) and ɛZ T(G|γ), and vol(•) denotes the volume of the γ-zonoid. Note that \( {\text{vol}}{\left( {Z_{T} {\left( {F\left| \gamma \right.} \right)}} \right)}{\text{ }} = {\left( {1 + \Sigma \gamma _{i} } \right)}M_{T} {\left( {F\left| \gamma \right.} \right)} \). According to Minkowski’s first inequality concerning the mixed volume (e.g. [8]), the following inequality is true if M T(F|γ) > 0.

$$ cv_{T} {\left( {G,F\left| \gamma \right.} \right)}: = \frac{{MV_{{d - 1}} {\left( {F,G\left| \gamma \right.} \right)}}} {{{\left( {1 + {\sum {\gamma _{i} } }} \right)}M_{T} {\left( {F\left| \gamma \right.} \right)}^{{{{\left( {d - 1} \right)}} \mathord{\left/ {\vphantom {{{\left( {d - 1} \right)}} d}} \right. \kern-\nulldelimiterspace} d}} }} - M_{T} {\left( {G\left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 d}} \right. \kern-\nulldelimiterspace} d}} \geqslant 0. $$
(37)

The equality holds if and only if Z T(F|γ) and Z T(G|γ) are homothetic – i.e. \( Z_{T} {\left( {F\left| \gamma \right.} \right)}{\text{ }} = {\text{ }}\alpha Z_{T} {\left( {G\left| \gamma \right.} \right)} \), where α is a positive constant.

Assume that the population consists of groups 1,2,... ,n. Let F i (y), μ i , and p i represent the d-variate distribution function, the expectation vector and the share of group i in the overall population, respectively. Let \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} {\left( y \right)} \) be the distribution functions of Y/μ within group i. Then inequality 37 allows the following decompositions by subgroup:

$$ M_{{\text{T}}} {\left( {F\left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 d}} \right. \kern-\nulldelimiterspace} d}} = {\sum {p_{i} M_{{\text{T}}} {\left( {F_{i} \left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 d}} \right. \kern-\nulldelimiterspace} d}} } } + {\sum {p_{i} cv_{{\text{T}}} } }{\left( {F_{i} ,F\left| \gamma \right.} \right)} $$
(38)
$$ R_{{\text{T}}} {\left( {F\left| \gamma \right.} \right)}^{{1/d}} = {\sum {p_{i} r_{{\text{T}}} {\left( {F_{i} \left| \gamma \right.} \right)}R_{{\text{T}}} {\left( {F\left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 d}} \right. \kern-\nulldelimiterspace} d}} } } + {\sum {p_{i} cv_{T} } }{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} ,\widetilde{F}\left| \gamma \right.} \right)}, $$
(39)

where \( r_{{\text{T}}} {\left( {F_{i} \left| \gamma \right.} \right)} = 0 \) if \( R_{{\text{T}}} {\left( {F\left| \gamma \right.} \right)} = 0 \), or \( {M_{{\text{T}}} {\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} \left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 d}} \right. \kern-\nulldelimiterspace} d}} } \mathord{\left/ {\vphantom {{M_{{\text{T}}} {\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} \left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 d}} \right. \kern-\nulldelimiterspace} d}} } {R_{{\text{T}}} {\left( {F_{i} \left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 d}} \right. \kern-\nulldelimiterspace} d}} }}} \right. \kern-\nulldelimiterspace} {R_{{\text{T}}} {\left( {F_{i} \left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 d}} \right. \kern-\nulldelimiterspace} d}} } \) otherwise, and cvT(G,F|γ) = 0 if M T(F|γ)=0. Note the following equality for the derivation of decompositions 38:

$$ M_{T} {\left( {F\left| \gamma \right.} \right)} = {\sum {p_{i} \frac{1} {{1 + {\sum {\gamma _{i} } }}}MV_{{d - 1}} } }{\left( {F,F_{i} \left| \gamma \right.} \right)} $$
(40)

The corresponding equality for the derivation of Eq. 39 is obtained by substituting \( \widetilde{F} \) and \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} \) for F and F i , respectively. The second term in decomposition 39 corresponds to the between-group inequality. According to Minkowski’s first inequality concerning the mixed volume, the second terms on the right-hand side of Eqs. 38 and 39 vanishes if and only if Z T(F i |γ) is homothetic to Z T(F|γ) for any group. This is true if M T(F|γ) > 0 – i.e. Z T(F|γ) has interior points. If M T(F|γ) = 0 – i.e. Z T(F|γ) is on some hyperplane, the Brunn-Minkowski inequality (e.g. [8]) asserts that Z T(F i |γ) is on the same hyperplane as Z T(F|γ); however, Z T(F i |γ) does not need to be homothetic to Z T(F|γ) in this case.

On the assumption that R T(F|γ) > 0, if the between-group inequality is null, the mean within each group μ i equals \( \alpha _{i} \mu + {\left( {1 - \alpha _{i} } \right)}\gamma \mu \) with some homothetic ratio α i  > 0. Thus, if γ = 1, μ i equals μ if the between-group inequality is null, while μ i equals α i μ if γ = 0. On this basis, R T(F|1) is preferable to the original Torgersen index R T(F|0), whereas r T(F i |γ) can be expressed as the simpler form (Πμ i /μ)1/d in the latter case.

It may be reasonable to assert that R T(F|γ)1/d should be used as a multivariate inequality index instead of R T(F|γ), taking the decomposability into consideration. This question is left open in this paper, since the subsequent discussions do not require any specific decision on this, although R T(F|γ)1/d is used for definition of the modified volume-Gini index.

cvT(F i ,F|γ) can be regarded as the contribution of each group to the between-group mean difference relative to its population share. It has the following representation, which corresponds to Eq. 10 for the univariate Gini mean difference and Eq. 27 for the distance-Gini mean difference:

$$ cv_{T} {\left( {F_{i} ,F\left| \gamma \right.} \right)} = {\mathop {\lim }\limits_{\varepsilon \to + 0} }\frac{1} {\varepsilon }{\left( {M_{T} {\left( {{\left( {1 - \varepsilon } \right)}F + \varepsilon F_{i} \left| \gamma \right.} \right)}^{{1/d}} - {\left( {1 - \varepsilon } \right)}M_{T} {\left( {F\left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 3}} \right. \kern-\nulldelimiterspace} 3}} - \varepsilon M_{T} {\left( {F_{i} \left| \gamma \right.} \right)}^{{1/d}} } \right)}. $$
(41)

The proof is given in the Appendix. The representation 41 expresses the notion that cvT(F i ,F|γ) equals the surplus of the dispersion relative to the merger ratio when a merger with an infinitely small ratio of group i takes place.

3.2.4 Modified volume-Gini index and its subgroup decomposition

The Oja and the modified Torgersen indices vanish not only when the distribution is egalitarian – i.e. for a one-point distribution – but also when the distribution is on a hyperplane. In this extreme case in which only one population unit monopolizes all income and property, the inequality measures zero. To avoid this drawback, Koshevoy and Mosler [11] proposed the volume-Gini mean difference, as follows:

$$ M_{{KM}} {\left( F \right)} = \frac{1} {{2^{d} - 1}}{\sum\limits_{s = 1}^d {{\sum\limits_{1 \leqslant j_{1} < \cdots j_{s} \leqslant d} {M_{O} } }} }{\left( {F^{{j_{1} \cdots j_{s} }} } \right)}, $$
(42)

where \( F^{{j_{1} \cdots j_{s} }} \) is the marginal distribution in the space of sub-coordinate axes \( {\left\{ {\,j_{1} , \cdots ,j_{s} } \right\}} \). the volume-Gini index R KM(F) is defined as \( M_{{KM}} {\left( {\widetilde{F}} \right)} \), namely, the average of the Oja sub-indices for the distribution F and all its marginal distributions in the spaces of the sub-coordinate axes. Since the Oja sub-index for any univariate marginal distribution (identical to the ordinary Gini index) vanishes if R KM(F) equals zero, R KM(F) vanishes if and only if the distribution is egalitarian. Thus, the drawback is surely overcome. However, further modification seems to be desirable, taking it into consideration that the Oja sub-indices for marginal distributions vary in homothetic degree to the following enlargement with dilation factor λ (> 0) and center at the mean μ:

$$ T_{{\lambda ,\mu }} :{\text{ }}y \to \lambda {\left( {y - \mu } \right)} + \mu . $$
(43)

Note that the d-variate Oja index is of homothetic degree d – i.e. \( R_{{\text{O}}} {\left( {T_{{\lambda ,\mu }} {\left( F \right)}} \right)} = \lambda ^{d} R_{{\text{O}}} {\left( F \right)} \). For this reason, the greater the dilation, the higher is the relative contribution of higher sub-dimensional marginal distributions, although the shape of the distribution and the mean remains the same. Furthermore, the decomposability of the volume-Gini index is also questionable. Koshevoy and Mosler [12] showed a type of generalization of the two-term decomposition 14. However, their decomposition has a very complex form, in addition to the same disadvantages as Eq. 14.

Thus, the volume-Gini mean difference should be modified by replacing the Oja mean sub-volumes, except for the univariate cases, with the modified Torgersen mean sub-volumes, power-transformed by the reciprocal of the dimensions, as follows:

$$ M_{V} {\left( {F\left| \gamma \right.} \right)} = \frac{1} {{2^{d} - 1}}{\left( {{\sum\limits_{s = 1}^d {M{\left( {F^{s} } \right)}} } + {\sum\limits_{s = 2}^d {{\sum\limits_{1 \leqslant j_{1} < \cdots j_{s} \leqslant d} {M_{T} {\left( {F^{{j_{1} \cdots j_{s} }} \left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 s}} \right. \kern-\nulldelimiterspace} s1/s}} } }} }} \right)}. $$
(44)

Strictly speaking, \( M_{T} {\left( {F^{{j_{1} \cdots j_{s} }} \left| \gamma \right.} \right)}^{{1 \mathord{\left/ {\vphantom {1 s}} \right. \kern-\nulldelimiterspace} s}} \) in Eq. 44 should be denoted as \( M_{T} {\left( {F^{{j_{1} \cdots j_{s} }} \left| {\gamma ^{{j_{1} \cdots j_{s} }} } \right.} \right)}^{{1/s}} \), where \( \gamma ^{{j_{1} \cdots j_{s} }} = {\left\{ {\gamma _{{j_{1} }} , \cdots ,\gamma _{{j_{s} }} } \right\}} \); However, the above notation is used for simplicity. The modified volume-Gini index R V(F|γ) is defined as \( M_{V} {\left( {\widetilde{F}\left| \gamma \right.} \right)} \). R V(F|γ) as well as the original volume-Gini index R KM(F) equals zero if and only if F is egalitarian. In addition, if γ = 1, R V(F) is of homothetic degree one to the enlargement T λ,μ . and the relative contribution of any sub-index is invariant to T λ,μ . Furthermore, M V(F|γ) and R V(F|γ) are decomposable into subgroups, as shown below. If considering only the homotheticity, the Oja sub-indices need not be replaced and the power transformations by reciprocal dimensions are sufficient. However, the decomposability is an open question in this case.

Assume M T(F|γ)>0. Then, according to inequalities 4 and 37, the following inequality is true:

$${\text{cv}}_{{\text{V}}} {\left( {G,F\left| \gamma \right.} \right)}: = \frac{1}{{2^{d} - 1}}{\left( {{\sum\limits_{s = 1}^d {{\text{cv}}{\left( {G^{s} ,F^{s} } \right)}} } + {\sum\limits_{s = 2}^d {{\sum\limits_{1 \leqslant j_{1} < \cdots j_{s} \leqslant d} {{\text{cv}}_{{\text{T}}} {\left( {G^{{j_{1} \cdots j_{s} }} ,F^{{j_{1} \cdots j_{s} }} \left| \gamma \right.} \right)}} }} }} \right)} \geqslant 0.$$
(45)

The equality holds if and only if G s = F s for any coordinate axis and Z T(F|γ) = Z T(G|γ). Proof of the condition for the equality is given in the Appendix.

Assume that the population consists of groups 1,2, ...,n. Let F i (y), μ i , and p i represent the d-variate distribution function, the expectation vector and the share of group i in the overall population, respectively. Let \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} {\left( y \right)} \) be the distribution functions of Y/μ within group i. Then inequality 45 allows the following subgroup decomposition of the modified volume-Gini mean difference and the modified volume-Gini index:

$$ M_{V} {\left( {F\left| \gamma \right.} \right)} = {\sum {p_{i} M_{V} } }{\left( {F_{i} \left| \gamma \right.} \right)} + {\sum {p_{i} cv_{V} } }{\left( {F_{i} ,F\left| \gamma \right.} \right)} $$
(46)

and

$$ R_{V} {\left( {F\left| \gamma \right.} \right)} = {\sum {p_{i} r_{V} {\left( {F_{i} |\left| \gamma \right.} \right)}R_{V} } }{\left( {F_{i} |\left| \gamma \right.} \right)} + {\sum {p_{i} cv_{V} } }{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} , \ifmmode\expandafter\tilde\else\expandafter\~\fi{F}\left| \gamma \right.} \right)}, $$
(47)

where r V(F i |γ)=0 if R V(F|γ)=0, or \( M_{V} {{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} \left| \gamma \right.} \right)}} \mathord{\left/ {\vphantom {{{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{F} _{i} \left| \gamma \right.} \right)}} {R_{V} }}} \right. \kern-\nulldelimiterspace} {R_{V} }{\left( {F_{i} \left| \gamma \right.} \right)} \) otherwise. The second term on the right-hand side of Eq. 47, which corresponds to the between-group inequality, equals zero if and only if \( F^{s}_{i} = F^{s} \) for any group and any coordinate axis, and Z T(F i |γ)=Z T(F|γ) for any group. The proof is given in the Appendix. Thus, M V(F i |γ)=M V(F|γ), and R V(F i |γ)=R V(F|γ) for any group if the between-group inequality is null. Unfortunately, equality of the γ-zonoids plus equality of the univariate marginal distributions is not equivalent to equality of the multivariate distributions. This is dissimilar to the equality of the lift zonoids [11]. A counterexample is given below.

Example

Let F 1 be a bivariate distribution evenly distributed at the six points {1,1}, {4,4}, {0,2}, {3,2}, {2,0} and {2,3}, and let F 2 be another bivariate distribution evenly distributed at the six points {0,0}, {3,3}, {1,2}, {4,2}, {2,1} and {2,4}. Let λ and 1-λ be the population share of F 1 and F 2, respectively – i.e. \( F = \lambda F_{1} + {\left( {1 - \lambda } \right)}F_{2} \). Then, F 1 and F 2 have the identical marginal distributions to F. Their means equal 2={2,2}. Their γ-zonoids Z T(F 1|2) and Z T(F 2|2) are also identical to Z T(F|2) (Fig. 2).

Fig. 2
figure 2

γ-zonoid of distributions F 1, F 2 and F in Example 1

A pair of bivariate distributions evenly distributed within triangles with vertices at {0,2}, {2,0}, {4,4} and {0,0}, {4,2}, {2,4}, respectively, is also a counterexample, yet decomposition 47 can be considered to nearly satisfy the CID condition, since the condition of the null between-group inequality ensures some equivalence among within-group distributions in terms of dilation ordering defined by the γ-zonoid 32 in addition to the equivalence of the univariate marginal distributions. At least, the mean and dispersion of any group measured by the modified volume-Gini index must agree with each other if the between-group inequality vanishes.

Before closing this subsection, the representation of cvV(F i ,F|γ), which corresponds to that of cvT(F i ,F|γ) in Eq. 41, is given below:

$$ cv_{V} {\left( {F_{i} ,F\left| \gamma \right.} \right)} = {\mathop {\lim }\limits_{\varepsilon \to + 0} }\frac{1} {\varepsilon }{\left( {M_{V} {\left( {{\left( {1 - \varepsilon } \right)}F + \varepsilon F_{i} \left| \gamma \right.} \right)} - {\left( {1 - \varepsilon } \right)}M_{V} {\left( {F\left| \gamma \right.} \right)} - \varepsilon M_{V} {\left( {F_{i} \left| \gamma \right.} \right)}} \right)}. $$
(48)

Similar to Eq. 41, representation 48 expresses the notion that cvV(F i ,F|γ) equals the surplus of the dispersion relative to the merger ratio for a merger with an infinitely small ratio of group i.

3.3 Source decomposition of the multivariate Gini indices

In this subsection, source decomposition of the distance-Gini index and the modified volume-Gini index is introduced.

3.3.1 Source decomposition of the distance-Gini index

Assume that attribute i consists of contributions from m i types of sources (i = 1,...,d). Let \( x^{{{\left( {k_{i} } \right)}}}_{i} \) and \( y^{{{\left( {k_{i} } \right)}}}_{i} \) be the expected contributions from source k i to attribute i for the conditions x={x i } and y={y i }, respectively, and \( \mu ^{{{\left( {k_{i} } \right)}}}_{i} \) be the unconditional mean of the contribution. Taking it into consideration that the distance-Gini mean difference is proportional to the average of the Gini mean differences for univariate marginal distributions on lines in all directions (see Eq. 21), it is intuitive to define the contribution of each source as the average of the quasi-Gini mean differences for univariate marginal distributions on lines in all directions with the same multiplier as Eq. 21, as follows:

$$ M_{D} {\left( F \right)} = {\sum\limits_{i = 1}^d {{\sum\limits_{k_{i} = 1}^{m_{i} } {\frac{{C_{d} }} {{2d}}{\int {{\int {{\left( {x^{{{\left( {k_{i} } \right)}}}_{i} - y^{{{\left( {k_{i} } \right)}}}_{i} } \right)}dF{\left( x \right)}dF{\left( y \right)}{\int_{S^{{d - 1}} } {\operatorname{sgn} {\left( {a \cdot {\left( {x - y} \right)}} \right)}a_{i} d\upsilon {\left( a \right)}} }} }} }} }} } = {\sum\limits_{i = 1}^d {{\sum\limits_{k_{i} = 1}^{m_{i} } {M^{{{\left( {k_{i} } \right)}}}_{D} } }} }{\left( F \right)}, $$
(49)

where υ is the uniform distribution on the unit sphere \( S^{{d - 1}} = {\left\{ {a \in R^{d} \left| {{\left\| a \right\|}} \right. = 1} \right\}} \), and sgn(x)=1 if x ≥ 0, or −1 otherwise. \( M^{{{\left( {k_{i} } \right)}}}_{D} {\left( F \right)} \), the contribution of each source to M D(F), should be called the quasi–distance-Gini mean difference. The corresponding decomposition for the distance-Gini index is obtained by substituting \( \widetilde{F} \) for F. Taking the definition of the univariate quasi-Gini index into consideration, the quasi-distance-Gini index \( R^{{{\left( {k_{i} } \right)}}}_{D} {\left( F \right)} \) should be defined as the contribution of each source to R D(F) relative to the amount share, as follows: \( M^{{{\left( {k_{i} } \right)}}}_{D} {\left( {\widetilde{F}} \right)} = \frac{{\mu ^{{{\left( {k_{i} } \right)}}}_{i} }} {{\mu _{i} }}R^{{{\left( {k_{i} } \right)}}}_{D} {\left( F \right)} \). Since the following equality is true (the proof is given in the Appendix),

$$ C_{d} {\int_{S^{{d - 1}} } {\operatorname{\rm sgn} {\left( {{\rm \bf a} \cdot \bf x} \right)}a_{i} d\upsilon {\left( \rm \bf a \right)}} } = \frac{{x_{i} }} {{{\left\| \bf x \right\|}}},{\text{ where }}\,\,\bf x \ne 0, $$
(50)

the quasi-distance-Gini mean difference can be expressed as follows:

$$ M^{{{\left( {k_{i} } \right)}}}_{D} {\left( F \right)} = \frac{1} {{2d}}{\int {{\int {\frac{{{\left( {x^{{{\left( {k_{i} } \right)}}}_{i} - y^{{{\left( {k_{i} } \right)}}}_{i} } \right)}{\left( {x_{i} - y_{i} } \right)}}} {{{\left\| {\bf x - y} \right\|}}}{\text{d}}F{\left( \bf x \right)}{\text{d}}F{\left( \bf y \right)}} }} } $$
(51)

Note that the integrand in Eq. 51 is assumed to be zero if x = y.

The quasi-distance-Gini mean difference of each source is also derived in the following manner. Assume that the contributions from source k i increase by an infinitely small rate ɛ uniformly – i.e. \( x^{{{\left( {k_{i} } \right)}}}_{i} \) increases to \( {\left( {1 + \varepsilon } \right)}x^{{{\left( {k_{i} } \right)}}}_{i} \) on any population unit; then the increase in the distance-Gini mean difference relative to rate ɛ equals \( M^{{{\left( {k_{i} } \right)}}}_{D} {\left( F \right)} \). This derivation is also applicable to the univariate Gini source decomposition of Rao [16].

3.3.2 Source decomposition of the modified volume-Gini index

The modified volume-Gini mean difference and the modified volume-Gini index seemingly do not have such intuitive derivation of source decomposition as the distance-Gini mean difference and the distance-Gini index. However, if adhering to the reasoning that the contribution of each source to the mean difference is derived from differentiation by rate of a uniform increase in amount for each source as mentioned in section 3.3.1, then a type of source decomposition is obtained for the modified volume-Gini mean difference and the modified volume-Gini index if \( M_{T} {\left( {F^{{i_{1} \cdots i_{s} }} \left| \bf γ \right.} \right)} \) > 0 for any marginal distribution in space of any sub-coordinate axes. Due to space limitations, only the result of the derivation for the modified volume-Gini index is presented here.

For introduction of the source decomposition, the following s × s matrices are first defined:

$$ M{\left( {y_{1} , \cdots ,y_{s} \left| {i_{1} \cdots i_{s} } \right.} \right)} = {\left[ {y_{1} - \gamma , \cdots ,y_{s} - \gamma } \right]},{\text{ }}and{\text{ }}N^{{{\left( {k_{{i1}} \cdots k_{{is}} } \right)}}} {\left( {y_{1} , \cdots ,y_{s} |i_{1} \cdots i_{s} } \right)} = {\left[ {\frac{{y^{{{\left( {k_{{i1}} \cdots k_{{is}} } \right)}}}_{1} }} {{\mu ^{{{\left( {k_{{i1}} \cdots k_{{is}} } \right)}}}_{{}} }} - \gamma , \cdots ,\frac{{y^{{{\left( {k_{{i1}} \cdots k_{{is}} } \right)}}}_{s} }} {{\mu ^{{{\left( {k_{{i1}} \cdots k_{{is}} } \right)}}} }} - \gamma } \right]}, $$
(52)

where \( y^{{(k_{{i1}} \cdots k_{{is}} )}}_{ \bullet } = {\left\{ {y^{{(k_{{il}} )}}_{{ \bullet i_{l} }} } \right\}}_{{l = 1, \cdots ,s}} \), and \( {\bf μ} ^{{(k_{{i1}} \cdots k_{{is}} )}} = {\left\{ {\mu ^{{(k_{{il}} )}}_{{i_{l} }} } \right\}}_{{l = 1, \cdots ,s}} \). Note that abbreviated notations are used in (52) for simplicity. \( {\left[ {{\bf y}_{1} - {\bf γ} , \cdots ,{\bf y}_{s} - {\bf γ} } \right]} \) is an abbreviation of \( {\left[ {{\bf y}_{1} - {\bf γ} ^{{i_{1} \cdots i_{s} }} , \cdots ,{\bf y}_{s} - {\bf γ} ^{{i_{1} \cdots i_{s} }} } \right]} \), for instance. In the t-th row of \( N^{{{\left( {k_{{i1}} \cdots k_{{is}} } \right)}}} {\left( {{\bf y}_{1} , \cdots ,{\bf y}_{s} \left| {i_{1} \cdots i_{s} } \right.} \right)} \), each element is equal to the conditional expected contribution of source k it to attribute i t relative to the unconditional expected contribution \( \mu ^{{{\left( {k_{{it}} } \right)}}}_{{i_{t} }} \) minus γ it .

By performing some manipulation after the above-mentioned derivation, the source decomposition of the modified volume-Gini index can be expressed as follows:

$$ R_{V} {\left( {F\left| \gamma \right.} \right)} = \frac{1} {{2^{d} - 1}}{\left( {{\sum\limits_{i = 1}^d {{\sum\limits_{k_{{it}} = 1}^{m_{i} } {R^{{{\left( {k_{{it}} } \right)}}} {\left( {F^{i} } \right)}} }} } + {\sum\limits_{s = 2}^d {{\sum\limits_{1 \leqslant i_{1} < \cdots < i_{s} \leqslant d} {{\sum\limits_{k_{{i1}} = 1}^{m_{{i1}} } { \cdots {\sum\limits_{k_{{is}} = 1}^{m_{{is}} } {{{\prod\limits_{l = 1}^s {\frac{{\mu ^{{{\left( {k_{{il}} } \right)}}}_{{i_{l} }} }} {{\mu _{{i_{l} }} }}} }R^{{{\left( {k_{{i1}} \cdots k_{{is}} } \right)}}}_{T} {\left( {F^{{i_{1} \cdots i_{s} }} \left| \gamma \right.} \right)}} \mathord{\left/ {\vphantom {{{\prod\limits_{l = 1}^s {\frac{{\mu ^{{{\left( {k_{{il}} } \right)}}}_{{i_{l} }} }} {{\mu _{{i_{l} }} }}} }R^{{{\left( {k_{{i1}} \cdots k_{{is}} } \right)}}}_{T} {\left( {F^{{i_{1} \cdots i_{s} }} \left| \gamma \right.} \right)}} {R_{T} {\left( {F^{{i_{1} \cdots i_{s} }} \left| \gamma \right.} \right)}^{{{{\left( {s - 1} \right)}} \mathord{\left/ {\vphantom {{{\left( {s - 1} \right)}} s}} \right. \kern-\nulldelimiterspace} s}} }}} \right. \kern-\nulldelimiterspace} {R_{T} {\left( {F^{{i_{1} \cdots i_{s} }} \left| \gamma \right.} \right)}^{{{{\left( {s - 1} \right)}} \mathord{\left/ {\vphantom {{{\left( {s - 1} \right)}} s}} \right. \kern-\nulldelimiterspace} s}} }} }} }} }} }} \right)}, $$
(53)

where \( R^{{{\left( {k_{i} } \right)}}} {\left( {F^{i} } \right)} = {\int {{\int {\operatorname{sgn} {\left( {x_{i} - y_{i} } \right)}{\left( {\frac{{x^{{{\left( {k_{i} } \right)}}}_{i} }} {{\mu ^{{{\left( {k_{i} } \right)}}}_{i} }} - \frac{{y^{{{\left( {k_{i} } \right)}}}_{i} }} {{\mu ^{{{\left( {k_{i} } \right)}}}_{i} }}} \right)}{\text{d}}F^{i} {\left( {x_{i} } \right)}{\text{d}}F^{i} {\left( {y_{i} } \right)}} }} } \) and \(R_T^{\left( {k_{i1} \cdots k_{is} } \right)} \left( {F^{i_1 \cdots i_s } \left| \gamma \right.} \right) = \frac{1}{{s!\left( {1 + \sum\limits_{l = 1}^s {\gamma _{i_l } } } \right)}}\int { \cdots \int {\operatorname{sgn} \left( {\det \left( {M\left( {y_1 , \cdots ,y_s \left| {i_1 \cdots i_s } \right.} \right)} \right)} \right)\det \left( {N^{\left( {k_{i1} \cdots k_{is} } \right)} \left( {y_1 , \cdots ,y_s \left| {i_1 \cdots i_s } \right.} \right)} \right){\text{d}}F\left( {y_1 } \right) \cdots {\text{d}}F\left( {y_s } \right)} .} \) \( R^{{{\left( {k_{i} } \right)}}} \) (F i) in the first term is the quasi-Gini index of the univariate marginal distribution in subspace of attribute i. The second term can be regarded as an interaction term among sources of different attributes, which makes a notable difference with the decomposition of the distance-Gini index.

3.4 Application to Japanese family budget data

Several types of multivariate Gini indices are estimated for annual income and consumption of Japanese households with two or more members using tabulated data from the National Survey of Family Income and Expenditures. Three-way tables of consumption class by income class by age class of household head are available for the household distribution. However, owing to the unavailability of the three-way table for average income and consumption, average income in two-way tables of income class by age class of household head are used for the estimation instead, irrespective of consumption class, and the intermediate value between the lower and upper limits of each consumption class is used as average consumption, irrespective of income class and age class of the household head. The estimates after adjustment by excluding the age effects are presented in Table 5. These estimates should be treated carefully because of the above-mentioned approximation; nevertheless, it is notable that the multivariate Gini indices for 2004 relative to 1989 are higher than the Gini indices of both univariate marginal distributions for income and consumption. For example, the distance-Gini index and the modified volume-Gini index for 2004 relative to 1989 are 98.4 and 98.6, respectively, while the Gini indices for annual income and consumption are 98.3 and 98.1, respectively. This indicates that consumption tends to vary more widely than before within the same income class, although the whole consumption distribution does not disperse as before.

Table 5 Multivariate Gini indices for distribution of income and consumption*

It is also notable that the modified volume-Gini index is relatively close to the distance-Gini index in comparison with the Oja index, the modified Torgersen index or their 1/2-th power transformations.

The contributions of age groups for household heads to changes in these indices are also estimated using the subgroup decomposition technique (Table 6). The multivariate Gini indices show similar tendencies to the univariate Gini indices, although the magnitudes of the contributions vary somewhat.

Table 6 Contributions of age groups for household heads to changes in multivariate Gini indices for distribution of income and consumption*

4 Concluding remarks

In this paper, a new type of subgroup decomposition for the Gini index is proposed. The new decomposition is consistent with multilevel sub-groupingsFootnote 3, and is characterized by the CID condition – i.e. the between-group inequality vanishes if and only if distributions within groups are identical to all the others.

The new decomposition is then generalized to two types of multivariate Gini indices introduced by Koshevoy and Mosler [11]. In the case of the distance-Gini index, the decomposition strictly satisfies the CID condition, while for the volume-Gini index, the decomposition satisfies the condition not strictly, but nearly after the index definition is modified to be of homothetic degree one to enlargement with the center at the mean.

Source decompositions of the two types of multivariate Gini indices are also introduced as a generalization of the Gini decomposition of Rao [16].

I hope this new decomposition will advance studies of economic inequality. The following remarks concerning the definition or concept of multivariate Gini indices may be helpful for further research.

Anderson [1] used the following multivariate Gini index, which is similar to the distance-Gini index:

$$ {\text{GINIMCW}} = \frac{1} {{2{\sqrt d }}}{\int {{\int {{\left\| {\frac{x} {\mu } - \frac{y} {\mu }} \right\|}_{W} dF{\left( x \right)}dF{\left( y \right)}} }} }, $$
(54)

where \( {\left\| {\frac{x} {\mu } - \frac{y} {\mu }} \right\|}_{W} = {\sqrt {{\sum\limits_{i = 1}^d {w_{i} {\left( {\frac{{x_{i} }} {{\mu _{i} }} - \frac{{y_{i} }} {{\mu _{i} }}} \right)}^{2} } }} },{\text{ }}w_{i} > 0{\text{ }}and{\text{ }}{\sum\limits_{i = 1}^d {w_{i} } } = d. \) .

The differences between Eq. 54 and the distance-Gini index highlight two issues. First, the Anderson index (Eq. 54) almost equals unity if the amounts for all attributes are monopolized by only one population unit and all other units have no contributions. The index exceeds unity if all the amounts for each attribute belong to only one population unit, but the monopolist for each attribute differs. The index almost equals \( \frac{1} {{{\sqrt d }}}{\sum\limits_{i = 1}^d {{\sqrt {w_{i} } }} }{\left( { > 1} \right)} \) in this case. The index can be limited to less than unity by modifying d for \( {\sqrt d } \) in Eq. 54 and \( w^{2}_{i} \) for w i in definition of the between-unit distance. Note that the weights constraint \( {\sum\limits_{i = 1}^d {w_{i} } } = d \) needs not be replaced. Such an index can be regarded as the weighted distance-Gini index. The subgroup and source decomposition for the distance-Gini index presented in this paper can easily be extended to it. However, the following naïve question still remains open:

Which situations should be judged to be higher in inequality, only one population unit makes all contributions to all attributes (the absolute monopolistic situation) or different monopolists exist for each attribute?

Since the Lorenz zonoid in the latter situation is wider than that in the former, the Oja index, the (modified) Torgersen index and the (modified) volume-Gini index are higher in the latter case. So is the distance-Gini index because of its consistency with the dilation ordering of the Lorenz zonoid [11]. This characteristic is seemingly not necessarily a disadvantage, at least in terms of the weighting problem described in the next paragraph. Nevertheless, several researchers have pursued multivariate inequality measures that are higher in the absolute monopolistic situation. Tsui [22] studied the multidimensional generalized entropy measures satisfying a condition of consistency with the correlation increasing majorization (CIM) as well as some other conditions. If a multidimensional inequality measure satisfies the CIM condition, the absolute monopolistic situation is judged to be higher in inequality. However, imposition of this condition is seemingly too restrictive (see also [7]). For example, Tsui’s multidimensional extension of the Theil measure is outside the restriction. The above question may rarely arise if wealthy population units contributing to one attribute tend to contribute to other attributes in practice. It thus seems likely that one of the approaches to determine the appropriateness of the CIM condition is to verify whether a set of attributes or the subject to be studied can be accounted by uni- or multi-dimensional factors after excluding measurement errors. If multiple factors are identified, then the way to extract each factor with mutual relations should be explored for measurement of inequality. In usual cases, attributes seem to be determined by one major common factor and some additional factors peculiar to individual attributes. In this context, it is notable that Easterlin [5, 6] pointed out that correlation between income and self-reported well-being is weak at least over the life cycle, implying that (subjective) well-being is affected by multiple factors.

Another issue raised by the Anderson index is the weight assignment to attributes concerned. If economic inequality is measured in terms of income, consumption and education level achieved, it seems to be reasonable to assign smaller weights to income and consumption relative to educational attainment because of their similarity. However, as the multivariate Gini indices automatically have lower outcomes if attributes are correlated with each other – in other words, the indices run counter to the CIM condition – the weighting problem is not considered to be serious relative to multivariate inequality measures that satisfy the CIM condition. The way to determine appropriate weights is not a trivial matter.

Thus, further research is still needed for applications of the multivariate Gini indices or other multivariate inequality measures.