Abstract
Minimum distance and maximum likelihood methods for estimating parametric income distributions from grouped data are summarized. Formulas for computing inequality and poverty measures from the parameters of the income distributions are presented. The paper is a convenient source for applied researchers wishing to estimate inequality and poverty from grouped data.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
JEL codes
Introduction
It is generally recognized that poverty and excessive inequality are socially undesirable. Reducing global poverty so that fewer individuals are deprived of basic needs is a major objective of international agencies. While what constitutes too much inequality is debatable, there is concern about the negative effects of rising inequality on health, crime and other aspects of society. Also, in extreme cases, inequality has led to the overthrow of governments and changes in the international order. It is important, therefore, be able to monitor changes in inequality and poverty using suitable measurement techniques. For this purpose, modelling and estimation of income distributions and Lorenz curves play an important role. Data available for modelling and estimation can be available in many forms. They may come from taxation data or from a variety of surveys. We focus on modelling and estimation when the data are limited in the sense that they come in grouped form, typically as the proportion of total income allocated to each of a number of groups, ordered according to increasing income, and with a specified proportion of the population within each group. These so-called income and population shares form the basis for estimating inequality through the Lorenz curve.Footnote 1 When share data are combined with data on mean incomes, income distributions can also be estimated, and their relationship with Lorenz curves can be exploited.
Data in grouped form are often utilized for large scale projects where inequality and poverty on a regional or global scale are being measured, and where compilation and dissemination of data in a more disaggregated form would be overly resource intensive. An example of such a study is Chotikapanich et al. (2012). Examples of locations where grouped share data are available for researchers are the World Bank’s PovcalNet websiteFootnote 2 and that of the World Institute for Development Economic Research.Footnote 3
Our objective is to summarize methods for estimating parametric income distributions using grouped data, to specify the functions needed for estimation for a number of popular parametric forms, and to provide formulae that can be used to compute inequality and poverty measures from the parameters of each of the distributions. In section Concepts, we introduce notation and concepts to be utilized later in the paper. The density, distribution and moment distribution functions that play an important role are introduced, along with poverty and inequality measures whose values can be calculated from estimates of the parameters of income distributions. We also describe the nature of the data that we assume are available. Section Estimation is devoted to estimation. Choice of estimation technique is influenced by whether or not group bounds are provided in the available data and on how the data are grouped: fixed group bounds and random population proportions or fixed population proportions and random group bounds. Both minimum distance (MD) and maximum likelihood (ML) estimators are considered, and results are provided for variants of the MD estimators which depend on which “distance” is being minimized. In section Specification of Distributions, Inequality and Poverty Measures, we tabulate the common parametric distributions that have been used to model income distributions; their density, distribution and moment distribution functions, and moments, are provided. Expressions that can be used to calculate inequality measures from the parameters of the different distributions are also tabulated. Expressions for some poverty measures are given in section Concepts; those for the Watts poverty index are tabulated in section Specification of Distributions, Inequality and Poverty Measures. In large projects, involving many countries and many years, MD and ML estimation can be daunting tasks. In section Simple Recipes for Two Distributions, we describe two relatively simple estimators for two specific distributions: the lognormal and the Pareto-lognormal. Some concluding remarks follow in section Concluding Remarks.
Concepts
We assume a population of incomes \(y\), with \(y > 0\), can be represented by a probability density function (pdf) \(f(y;\theta )\) where \(\theta\) is a vector of unknown parameters. Our objective is to review several alternative functional forms that have been suggested for \(f(y;\theta )\), to describe methods for estimating \(\theta\) from grouped data, and to provide expressions that can be used to compute estimates of inequality and poverty measures from estimates for \(\theta\).
We further assume \(y\) has a finite mean \(\mu = \int\limits_{0}^{\infty } {y\,f(y;\theta )} \,dy.\) Its cumulative distribution function (cdf) will be denoted by
and its first moment distribution function (fmdf) by
We will also utilize the second moment distribution function (smdf)
where \(\mu^{(2)}\) is the second moment \(\mu ^{{(2)}} \, = \,\int\limits_{0}^{\infty } {y^{2} \,f\,(y;\theta )\,dy.}\) The Lorenz curve, relating the cumulative proportion of income to the cumulative proportion of population, is given byFootnote 4
When modelling begins with the specification of a Lorenz curve, the quantile function \(y\, = \,F^{{ - 1}} \,(\lambda ;\theta )\) can be found from it via differentiation,
Inequality Measures
The most commonly cited inequality measure is the Gini coefficient \(g\) which is given by twice the area between the Lorenz curve and the line of equality where \(\eta = \lambda .\) That is,
Two further inequality measures that we consider are the Theil indices which are special cases of a generalized entropy class of measures. Unlike the Gini coefficient, members of this class have the advantage of being additively decomposable into population subgroups. The general class is given by
The parameter \(v\) controls the sensitivity of the index to income differences in different parts of the income distribution; larger positive values imply greater sensitivity to income differences in the upper part of the distribution and more negative values imply greater sensitivity to differences in the lower part of the distribution. The Theil special cases are those for \(v \to 0\) and \(v \to 1.\) They are given by
The last inequality measure that we consider is the Pietra index which is equal to the maximum distance between the Lorenz curve and the equality line \(\eta = \lambda .\) It can be written as the difference between the cdf and the fmdf, evaluated at \(\mu .\)
Poverty Measures
Modelling and estimating income distributions are also useful for evaluating poverty. We consider four poverty measures, the headcount ratio \(HC,\) the poverty gap \(PG,\) the \(FGT\) index with the inequality aversion parameter set at 2 and the Watts index, \(WI.\) For convenience, we express \(HC,\) \(PG\) and \(FGT\) in terms of distribution and moment distribution functions, and moments, which are tabulated for specific distributions in section Specification of Distributions, Inequality and Poverty Measures. The Watts index requires more work, however; we defer specific parametric expressions for it until section Specification of Distributions, Inequality and Poverty Measures. Given a specific poverty line \(z,\) we have
Data Setup
For estimating the various inequality and poverty measures, we assume we have a sample \(\user2{y^{\prime}} = (y_{1} ,y_{2} ,....,y_{T} )\) randomly drawn from \(f(y;\theta )\), and grouped into \(N\) income classes \((x_{0} ,x_{1} ),\) \((x_{1} ,x_{2} ), \ldots ,(x_{N - 1} ,x_{N} )\) with \(x_{0} = 0\) and \(x_{N} = \infty .\) We denote the proportion of observations in the i-th group as \(c_{i} ,\) mean income in the i-th group as \(\overline{y}_{i} ,\) and mean income for the whole sample as \(\overline{y}.\) The income share for the i-th group is \(s_{i} = c_{i} {{\overline{y}_{i} } \mathord{\left/ {\vphantom {{\overline{y}_{i} } {\overline{y}.}}} \right. \kern-\nulldelimiterspace} {\overline{y}.}}\) Sometimes observations \(\user2{c^{\prime}} = \left( {c_{1} ,c_{2} , \ldots ,c_{N} } \right)\) and \(\user2{s^{\prime}} = (s_{1} ,\,s_{2} , \ldots s_{N} )\) are available from one source and \(\overline{y}\) is available from another source, in which case group mean incomes can be found from \(\overline{y}_{i} = s_{i} {{\overline{y}} \mathord{\left/ {\vphantom {{\overline{y}} {c_{i} .}}} \right. \kern-\nulldelimiterspace} {c_{i} .}}\) In the next section, we describe various methods for estimating \(\theta ,\) given the observations \((c_{i} ,s_{i} ,\overline{y}).\)
Estimation
The estimation methods that we review can be categorized according to the way in which the data are generated, and whether the group bounds \(\user2{x^{\prime}} = (x_{0} ,x_{1} , \ldots ,x_{N} )\) are known in addition to the observations on \((c_{i} ,s_{i} ,\overline{y}).\) There are two ways in which the data can be generated. The group bounds x can be specified a priori, making the proportions of observations which fall into each group \(c_{i}\), and the group means \(\overline{y}_{i}\), the random variables. Alternatively, the \(c_{i}\) can be specified a priori, in which case the group bounds x are random variables, along with the group means \(\overline{y}_{i}\). We consider estimation techniques for each of these cases in turn, noting the implications of known and unknown values for the group boundaries.
Estimation with Fixed x, Random c, Random \(\overline{\user2{y}}_{{\varvec{i}}}\)
One approach for estimating \(\theta\) when the group bounds x are known and the \(c_{i}\) are random is to maximize the likelihood function for the multinomial distribution. This approach uses information on \({\varvec{x}}\) and \({\varvec{c}}\), but does not utilize the information contained in \({\varvec{s}}\) and \(\overline{y}.\) The log of the likelihood function is given by
where \(K\) is a constant.
In a series of papers (Griffiths & Hajargasht, 2015; Hajargasht & Griffiths, 2020; Hajargasht et al., 2012), three minimum distance (MD) estimators suitable for random \(c_{i}\) and \(\overline{y}_{i}\) were introduced.Footnote 5 These estimators utilize information on \({\varvec{c}},{\varvec{s}}\) and \(\overline{y},\) and can be applied with or without knowledge of \({\varvec{x}}.\) When \({\varvec{x}}\) is unknown it can be treated as a set of unknown parameters and estimated along with \(\theta .\) The three estimators all have the same limiting distribution, but do not yield identical estimates. They are more efficient than the ML estimator from the multinomial likelihood function where only information from \(c_{i}\) is utilized. To introduce the three estimators, we begin by noting the following:
where we write \(\phi = (x,\theta )\) to accommodate the case where \({\varvec{x}}\) is unobserved, making the unknown parameter vector equal to \(\phi .\) If \({\varvec{x}}\) is observed, we can proceed in the same way, utilizing the known \({\varvec{x}}\) and treating \(\theta\) as the unknown parameter vector.
MD Estimator 1
For the first MD estimator, we define
Since \(\sum\nolimits_{i = 1}^{N} {\tilde{y}_{i} } = \sum\nolimits_{i = 1}^{N} {c_{i} \overline{y}_{i} } = \overline{y},\) we interpret \(\tilde{y}_{i}\) as that part of mean income \(\overline{y}\) that comes from the i-th group. Then, from (5.17) and (5.18),
From (5.16) and (5.19), we can set up the MD estimator
where
and \(W\) is a weight matrix. Note that \(\mu\) will also depend on \(\phi ,\) the exact function depending on the parametric pdf chosen for the income distribution. Also, \(c_{N} - \left[ {\lambda_{N} (\phi ) - \lambda_{N - 1} (\phi )} \right]\) has been omitted since having \(\sum\nolimits_{i = 1}^{N} {c_{i} = 1}\) makes one of the \(c_{i}\) entries redundant.
A possible weight matrix, one suggested by Chotikapanich et al. (2007), is to set the diagonal elements of \(W\) as \(w_{i} = {1 \mathord{\left/ {\vphantom {1 {c_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {c_{i}^{2} }}\) for \(i = 1,2, \ldots ,N - 1\) and \(w_{N - 1 + i} = {1 \mathord{\left/ {\vphantom {1 {\tilde{y}_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {\tilde{y}_{i}^{2} }}\) for \(i = 1,2, \ldots ,N,\) and the off-diagonal elements to zero. With this setting \(\hat{\phi }_{1}\) minimizes the sum of squares of percentage errors. This weight matrix, call it \(W_{CGR} ,\) is a simple one, and it works well in practice, but it is not optimal; it does not lead to the most efficient estimator for \(\phi .\) Hajargasht et al. (2012) show that the inverse of the optimal weight matrix is given by
where \(0_{N - 1}\) is an \((N - 1) -\) dimensional vector of zeros, and \(D_{1} ,D_{2}\) and \(D_{3}\) are diagonal matrices. Their elements, and those of \(A_{1} ,A_{2}\) and \(A_{3,}\) are as follows.
All these quantities depend on the unknown parameter vector \({\mathbf{\phi }}.\) To ease the notation, we have not made this dependence explicit. Note also that, through \(D_{3} ,\) \(W\) will depend on the second moment \(\mu^{(2)}\) and the second moment distribution function \(\psi_{i} = F^{(2)} (x_{i} ;\theta ).\)
After inverting \(W_{1}^{ - 1}\) to find \(W_{1} ,\) and simplifying, the objective function in (5.20) can be shown to be equal to
where
and
There are three possible ways to approach the problem of finding an estimate \(\hat{\phi }\) that minimizes \(H_{1} (\phi )^{\prime}W_{1} (\phi )H_{1} (\phi ):\)
-
1.
A two-step estimator where first an estimate \(\hat{\phi }_{CGR}\) is obtained using the weight matrix \(W_{CGR} ,\) and then a second estimate \(\hat{\phi }_{2 - STEP}\) is obtained by minimizing \(H_{1} (\phi )^{\prime}W_{1} (\hat{\phi }_{CGR} )H_{1} (\phi ).\)
-
2.
An iterative estimator obtained by iterating the 2-step estimator until convergence is achieved.
-
3.
A “continuous updating estimator” where the whole function in (5.23) is minimized with respect to \(\phi .\)
These three estimators all have the same limiting distribution but can produce different estimates. Their asymptotic covariance matrix is
where \(H_{1}^{*}\) is a \((2N \times 1)\) vector obtained from \(H_{1}\) by including \(c_{N} - \left( {\lambda_{N} - \lambda_{N - 1} } \right)\) in the N-th position, and \(W_{1}^{*}\) is a \((2N \times 2N)\) matrix with 4 \((N \times N)\) diagonal blocks \(D_{11} ,\;D_{12} ,\;D_{21} = D_{12}\) and \(D_{22} .\) The i-th diagonal elements of these matrices are \(w_{1i}\) for \(D_{11} ,\) \(w_{2i}\) for \(D_{22}\) and \(- w_{3i}\) for \(D_{12} .\) See Eqs. (5.24) to (5.26).
MD Estimator 2
The second MD estimator is that considered by Griffiths and Hajargasht (2015). It follows the same principles as the previous one, but it replaces \(\tilde{y}_{i}\) by \(\overline{y}_{i} .\) To accommodate this replacement, we note that, from (5.16)–(5.18),
In this case, the MD estimator can be written as
where
and \(W_{2}\) is a specified weight matrix. The weight matrix that is analogous to \(W_{CGR} ,\) suggested for the previous estimator as a simple choice, or as a starting point for estimators that use an optimal weight matrix, is a diagonal matrix with elements \(w_{i} = {1 \mathord{\left/ {\vphantom {1 {c_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {c_{i}^{2} }}\) for \(i = 1,2, \ldots ,N - 1,\) and \(w_{i + N - 1} = {1 \mathord{\left/ {\vphantom {1 {\overline{y}_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {\overline{y}_{i}^{2} }}\) for \(i = 1,2, \ldots ,N.\) Griffiths and Hajargasht (2015) show that the optimal weight matrix, for use with a 2-step, iterative or continuous updating estimator, is given by
where
and \(\delta_{ij} = 1\) when \(i = j\) and \(\delta_{ij} = 0\) when \(i \ne j.\) Using these results, the objective function can be simplified to
As before, \(H_{2} (\phi )^{\prime}W_{2} (\phi )H_{2} (\phi )\) can be minimized using a 2-step estimator, an iterative estimator or a continuous updating estimator. The weights are \({1 \mathord{\left/ {\vphantom {1 {(\lambda_{i} - \lambda_{i - 1} )}}} \right. \kern-\nulldelimiterspace} {(\lambda_{i} - \lambda_{i - 1} )}}\) for the first terms in (5.33) and \(\left[ {E_{2} } \right]_{ii}\) for the second. In contrast to the earlier formulation in (5.23), there are no cross product terms, making the minimization problem simpler and convergence easier to obtain. The large sample covariance matrix of an estimator \(\hat{\phi }_{2}\) using an optimal weight matrix is
where \(H_{2}^{*}\) is a \((2N \times 1)\) vector obtained from \(H_{2}\) by including \(c_{N} - (\lambda_{N} - \lambda_{N - 1} )\) in the N-th position, and \(W_{2}^{*}\) is a \((2N \times 2N)\) block-diagonal matrix with elements \({1 \mathord{\left/ {\vphantom {1 {(\lambda_{i} - \lambda_{i - 1} )}}} \right. \kern-\nulldelimiterspace} {(\lambda_{i} - \lambda_{i - 1} )}}\) in the first diagonal block and elements \(\left[ {E_{2} } \right]_{ii}\) in the second diagonal block.
MD Estimator 3
The third MD estimator that we describe is that considered by Hajargasht and Griffiths (2020). Its essential difference is that it considers cumulative population and income shares. To develop it, we begin by defining.
.
and recognizing that
Using (5.36) and (5.37), we can construct the MD estimator as
where
and \(W_{3}\) is a pre-specified weight matrix. A simple weight matrix that can be used to simplify calculations or as a starting point for estimators that use an optimal weight matrix is a diagonal matrix with elements \(w_{i} = {1 \mathord{\left/ {\vphantom {1 {\hat{\lambda }_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {\hat{\lambda }_{i}^{2} }}\) for \(i = 1,2, \ldots ,N - 1\) and \(w_{N - 1 + i} = {1 \mathord{\left/ {\vphantom {1 {\left( {\overline{y}\,\hat{\eta }_{i} } \right)^{2} }}} \right. \kern-\nulldelimiterspace} {\left( {\overline{y}\,\hat{\eta }_{i} } \right)^{2} }}\) for \(i = 1,2, \ldots ,N.\) Hajargasht and Griffiths (2020) show that the optimal weight matrix is given by
where
-
1.
\(L_{11}\) is a \(\left[ {(N - 1) \times (N - 1)} \right]\) tri-diagonal matrix with the following nonzero elements:
$$\begin{gathered} \left[ {L_{{11}} } \right]_{{ii}} = \frac{{\mu ^{{(2)}} \left( {\psi _{{i + 1}} - \psi _{i} } \right)}}{{v_{{i + 1}} }} + \frac{{\mu ^{{(2)}} \left( {\psi _{i} - \psi _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,i = 1,2, \ldots ,N - 1 \hfill \\ \left[ {L_{{11}} } \right]_{{ij}} = \left\{ \begin{gathered} - \frac{{\mu ^{{(2)}} \left( {\psi _{i} - \psi _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,\,\,\,\,i = 2,\,3,\, \ldots ,\,N - 1;\,j\, = \,i - 1 \hfill \\ - \frac{{\mu ^{{(2)}} \left( {\psi _{j} - \psi _{{j - 1}} } \right)}}{{v_{j} }}\,\,\,\,\,\,\,\,{\text{ }}j\, = \,2,\,3,\, \ldots ,\,N - 1;\,i\, = \,j - 1 \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered}$$(5.41) -
2.
\(L_{12}\) is a \(\left[ {(N - 1) \times N} \right]\) matrix with the following nonzero elements:
$$\begin{gathered} \left[ {L_{{12}} } \right]_{{ii}} \, = \, - \frac{{\mu \left( {\eta _{{i + 1}} - \eta _{i} } \right)}}{{v_{{i + 1}} }} - \frac{{\mu \left( {\eta _{i} - \eta _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,i = 1,\,2,\, \ldots ,\,N - 1 \hfill \\ \left[ {L_{{12}} } \right]_{{ij}} = \left\{ \begin{gathered} \frac{{\mu \left( {\eta _{i} - \eta _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,\,\,\,\,i\, = \,2,\,3,\, \ldots ,\,N - 1;\,j\, = \,i - 1 \hfill \\ \frac{{\mu \left( {\eta _{j} - \eta _{{j - 1}} } \right)}}{{v_{j} }}\,\,\,\,\,\,\,\,{\text{ }}j\, = \,2,\,3,\, \ldots ,\,N;\,i = j - 1 \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered}$$(5.42) -
3.
\(L_{22}\) is a \(\left[ {N \times N} \right]\) tri-diagonal matrix with the following nonzero elements
$$\begin{gathered} \left[ {L_{{22}} } \right]_{{ii}} \, = \,\frac{{\left( {\lambda _{{i + 1}} - \lambda _{i} } \right)}}{{v_{{i + 1}} }} + \frac{{\left( {\lambda _{i} - \lambda _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,i\, = \,1,\,2,\, \ldots ,\,N - 1 \hfill \\ \left[ {L_{{22}} } \right]_{{ij}} \, = \,\left\{ \begin{gathered} \frac{{ - \left( {\lambda _{i} - \lambda _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,\,\,\,\,i\, = \,2,\,3,\, \ldots ,\,N;\,\,j\, = \,i - 1 \hfill \\ \frac{{ - \left( {\lambda _{j} - \lambda _{{j - 1}} } \right)}}{{v_{j} }}\,\,\,\,\,\,\,\,\,{\text{ }}j\, = \,2,\,3,\, \ldots ,\,N;\,i\, = \,j - 1 \hfill \\ \end{gathered} \right. \hfill \\ \left[ {L_{{22}} } \right]_{{NN}} \, = \,\frac{{ - \left( {\lambda _{N} - \lambda _{{N - 1}} } \right)}}{{v_{N} }} \hfill \\ \end{gathered}$$(5.43)
As in the previous two cases, the objective function can be minimized using a two-step estimator, an iterative estimator or a continuous updating estimator. The asymptotic covariance matrix for \(\hat{\phi }_{3} ,\) when using an optimal covariance matrix, is
A Quasi ML Estimator
Building on the work of Hilomi et al. (2008), Eckernkemper and Gribisch (2021), propose a quasi ML estimator. They combine the multinomial likelihood in Eq. (5.15) with a Gaussian approximation for the group means \(\overline{y}_{i} .\) Including the extra information means that estimation can proceed with or without knowledge of the group bounds, with these bounds being treated as parameters to be estimated when they are unknown. Let \(T_{i} = c_{i} T\) be the number of observations in groups \(i.\) Each \(\overline{y}_{i}\) is assumed to be \(N\left( {\tilde{\mu }_{i} ,{{\tilde{\sigma }_{i}^{2} } \mathord{\left/ {\vphantom {{\tilde{\sigma }_{i}^{2} } {T_{i} }}} \right. \kern-\nulldelimiterspace} {T_{i} }}} \right)\) where the \(\tilde{\mu }_{i}\) and the \(\tilde{\sigma }_{i}^{2}\) are the means and variances of \(y\) from truncations \(\left( {x_{i - 1} < y_{i} < x_{i} } \right)\) of the originally specified distribution. That is,
and
Using these results, the log of the likelihood function can be written as
Eckernkemper and Gribisch (2021) show that the estimator for \(\phi\) that maximizes \(L(\phi )\) is consistent and that the covariance matrix of its limiting distribution is the same as that for MD estimators 1 and 2.
Estimation with Fixed \({\varvec{c}},\) Random \({\varvec{x}},\) Random \(\overline{\user2{y}}_{{\varvec{i}}}\)
In this case, the observations are grouped such that the proportion of observations in each group is pre-specified. Examples are 10 groups with 10% of the observations in each group or 20 groups with 5% of the observations in each group. This setup implies the proportion \(c_{i}\) are fixed (non-random) and the sample group boundaries \({\varvec{x}}\) as well as the average cumulative incomes \(\overline{y}_{i}\) are random variables. Let \(y_{[1]} ,y_{[2]} , \ldots ,y_{[T]}\) be the order statistics obtained by arranging the original observations \({\varvec{y}}\) in ascending order. An estimate for a group bound \(x_{i}\) is the largest order statistic in the i-th group, \(\hat{x}_{i} = y_{{[\hat{\lambda }_{i} T]}} .\) If the \(\hat{x}_{i}\) are observed, estimation can use both the \(\hat{x}_{i}\) and the \(\overline{y}_{i} ;\) if the \(\hat{x}_{i}\) are unobserved, then only the information in \(\overline{y}_{i}\) can be utilized. We consider MD and ML estimation for both these cases. MD estimation with unobserved \(\hat{x}_{i}\) corresponds to Lorenz curve estimation which has attracted a great deal of attention in the literature. See, for example, Chotikapanich (2008). A Lorenz curve implied by a specific income distribution is defined by Eq. (5.4). An alternative is to start with a specific parametric Lorenz curve in which case the corresponding income distribution is defined via the quantile function in (5.5). A problem with the latter approach is that the income distributions corresponding to some Lorenz curves are not defined for all values of \(y.\)
MD Estimation
The MD estimators that we consider are those proposed by Hajargasht and Griffiths (2020). Suppose, in the first instance, that the \(\hat{x}_{i}\) are observed. To use this information in an MD estimator, we recognize thatFootnote 6
To use information on the income shares, we use the cumulative shares \(\hat{\eta }_{i}\) multiplied by mean income \(\overline{y},\) in line with MD estimator 3 for the random \({\varvec{c}}\) case. One difference, however, is that we express its probability limit in terms of the non-random \({\varvec{c}}\), instead of \({\varvec{x}}\), which is now a random variable. That is,
To set up the MD estimator, it is convenient to define notation for a generalized Lorenz curve which can be written as
Then, from (5.48)–(5.50), we can set up the following MD estimator,
where
and \(W_{4}\) is a suitable chosen weight matrix. It can be shown that the optimal weight matrix is given by
where
The covariance matrix for the limiting distribution of \(\hat{\theta }_{4}\) is
When there are a large number of groups, the matrix inversion in (5.53) can be computationally demanding. Hajargasht and Griffiths (2020) show how \(W_{4}^{ - 1}\) can be derived from \(W_{3}^{ - 1}\) which has computationally convenient tri-diagonal blocks. They also demonstrate that, if the groupings for this set up are equivalent to those for the MD3 setup in the sense that, a priori, \(x_{i} = F^{ - 1} (\lambda_{i} ;\theta ),\) then the asymptotic covariance matrices for \(\hat{\theta }_{3}\) and \(\hat{\theta }_{4}\) are identical.
Minimizing (5.51) to find an estimate \(\hat{\theta }_{4}\) can proceed using one of the three algorithms described in section Estimation with Fixed x, Random c, Random \(\overline{\user2{y}}_{{\varvec{i}}}\). However, there are two requirements which will not always be met: estimates of the bounds \(\hat{x}_{i} = y_{{[\hat{\lambda }_{i} T]}}\) must be observed and the cdf must be invertible, either algebraically or computationally, so that quantiles \(F^{ - 1} (\hat{\lambda }_{i} ;\theta )\) can be found. Note that \(F^{ - 1} (\hat{\lambda }_{i} ;\theta )\) appears not only in the first \((N - 1)\) elements of \(H_{4}\) but also in the next \((N - 1)\) elements that involve the generalized Lorenz curve \(G(\lambda ;\theta ) = \mu \,F^{(1)} \left( {F^{ - 1} (\lambda ;\theta );\theta } \right).\) One way to overcome non-invertibility of the cdf is to replace the assumption of a parametric income distribution with an assumption of a parametric Lorenz curve. Doing so overcomes the problem for the second set of elements in \(H_{4} ,\) and relationships between the generalized Lorenz curve and the quantile function—see Hajargasht and Griffiths (2020)—can be exploited to obtain the first set of elements in \(H_{4} .\)
When the \(\hat{x}_{i}\) are unobserved, estimation can proceed using the last \(N\) elements in \(H_{4} ,\) with calculations made from an assumed income distribution if the cdf is invertible, or from an assumed Lorenz curve if the cdf is not invertible. This last approach is that most closely aligned with suggestions for Lorenz curve estimation which have appeared in the literature.Footnote 7 Earlier suggestions are sub-optimal in the sense that they do not use the best weighting matrix. Details can be found in Hajargasht and Griffiths (2020).
ML Estimation
ML estimation of \(\theta\) for fixed \(c_{i} ,\) and random \(x_{i}\) and \(\overline{y}_{i}\) was considered by Eckernkemper and Gribisch (2021). Recognizing that the joint density for the group bounds and group means can be written as
they set up a likelihood function that uses distribution theory for order statistics for \(f(\hat{\user2{x}})\) and a Gaussian approximation for \(f\left( {\overline{y}_{i} |\hat{x}_{i} ,\hat{x}_{i - 1} } \right).\) Using results in David and Nagaraji (2003), the conditional means and variances for the \(\overline{y}_{i}\) can be written as
and
The log-likelihood is
where \(K_{2}\) is a constant. The estimator \(\hat{\theta }_{5}\) that minimizes (5.61) can be interpreted as a quasi ML estimator. Eckernkemper and Gribisch (2021) establish its asymptotic covariance matrix as
One difference between the estimator \(\hat{\theta }_{5}\) and those estimators considered in the earlier sections is that it requires knowledge of the sample size \(T,\) from which the number of observations in each group can be found from \(T_{i} = c_{i} T.\) All estimators require knowledge of \(T\) to compute standard errors, but knowledge of the proportions \(c_{i} ,\) without knowledge of \(T,\) is sufficient for the earlier estimators for \(\theta\) and \(\phi\) to be employed.
For ML estimation of \(\theta\) when the \(\hat{x}_{i}\) are not observed, Eckernkemper and Gribisch (2021) integrate out the \(\hat{x}_{i}\) from the likelihood in (5.61) to obtain the following log-likelihood
where \(K_{3}\) is a constant, \(\user2{\overline{y}^{\prime}} = (\overline{y}_{1} ,\overline{y}_{2} , \ldots ,\overline{y}_{N} ),\) \({{\varvec{\upmu}}}^{*}\) is an \((N \times 1)\) vector with i-th element equal to
and \(\Xi = DB\,\Omega_{22}^{*} B^{\prime}D\) where \(D = {\text{diag}} \left( {c_{1}^{ - 1} ,c_{2}^{ - 1} , \ldots ,c_{N}^{ - 1} } \right),\) \(\left[ B \right]_{ii} = 1,\,\,\left[ B \right]_{ij} = - 1\) for \(i = j + 1,j = 1,2, \ldots ,N - 1,\) and zero elsewhere, and \(\Omega_{22}^{*}\) is equal to \(\Omega_{22}\) defined in (5.55), but with \(\hat{x}_{i}\) and \(\hat{x}_{j}\) replaced by \(F^{ - 1} \left( {\hat{\lambda }_{i} ;\theta } \right)\) and \(F^{ - 1} \left( {\hat{\lambda }_{j} ;\theta } \right),\) respectively. The asymptotic covariance matrix for the estimator \(\hat{\theta }_{6}\) obtained by maximizing (5.61) is
Specification of Distributions, Inequality and Poverty Measures
To implement the estimation methods described in section Estimation, a specific parametric distribution has to be specified and we need its moments, its pdf, cdf, fmdf and smdf. This information is provided in Table 5.1 for several popular income distributions. Once the parameters of a chosen distribution have been estimated, estimates for inequality and poverty incidence are frequently of interest. In Table 5.2, we provide expressions that can be used to compute inequality estimates from the estimates of the parameters. Expressions for the poverty estimates were given in section Poverty Measures, with the exception of the Watts Index whose expressions we have tabulated in Table 5.3.
Simple Recipes for Two Distributions
In some instances, where large scale projects involving many countries and many time periods are being undertaken, it may be prudent to use estimation techniques which are relatively simple. In this section, we consider two estimation techniques that fall into this category—one for the lognormal distribution and one for the Pareto-lognormal distribution.
Lognormal Distribution
In the previous section, we indicated that the Gini coefficient for the lognormal distribution is \(g = 2\Phi \left( {{\sigma \mathord{\left/ {\vphantom {\sigma {\sqrt 2 }}} \right. \kern-\nulldelimiterspace} {\sqrt 2 }}} \right) - 1\) and its mean is \(\mu = \exp \left\{ {\beta + {{\sigma^{2} } \mathord{\left/ {\vphantom {{\sigma^{2} } 2}} \right. \kern-\nulldelimiterspace} 2}} \right\}.\) Using grouped data the Gini coefficient can be estimated from
and the mean can be estimated using \(\overline{y},\)
Utilizing these two equations and the expression for the Gini coefficient yields the parameter estimates.
This approach was adopted by Chotikapanich et al. (1997).
Pareto-Lognormal Distribution
For the Pareto-lognormal distribution, we can estimate the Theil inequality measures from the grouped data, and then use these estimates, along with sample mean income to estimate the parameters. Working in this direction, the grouped-data sample estimates are
The corresponding quantities in terms of the parameters of the Pareto-lognormal distribution are
Assuming the mean exists \((\alpha > 1),\) from (5.70)–(5.75) we can retrieve parameter estimates using the following three steps:
-
1.
Find \(\hat{\alpha }\) as the Solution to the Equation
$$\frac{{2\hat{\alpha } - 1}}{{\hat{\alpha }(\hat{\alpha } - 1)}} + 2\ln \left( {\frac{{\hat{\alpha } - 1}}{{\hat{\alpha }}}} \right) = \hat{T}_{1} - \hat{T}_{0}$$(5.76) -
2.
Find \(\hat{\sigma }^{2}\) from
$$\hat{\sigma }^{2} = \hat{T}_{1} + \hat{T}_{0} - \frac{1}{{\hat{\alpha }(\hat{\alpha } - 1)}}$$(5.77) -
3.
Find \(\hat{\beta }\) from
$$\hat{\beta } = \ln (\overline{y}) + \ln \left( {\frac{{\hat{\alpha } - 1}}{{\hat{\alpha }}}} \right) - \frac{{\hat{\sigma }^{2} }}{2}$$(5.78)
Concluding Remarks
Inequality and poverty, both nationally and globally, continue to be two of the most pressing issues facing today’s society. Accurate measurement of inequality and poverty involves a multitude of non-trivial considerations including reliable data collection, specification of purchasing power parities and definition of a suitable poverty line. We have focused on a further consideration, how to model and estimate income distributions, and how to estimate inequality and poverty from the parameters of those income distributions, when using grouped data. Single observations are becoming increasingly available and their use is preferred to the use of grouped data if resources are adequate for doing so. However, countries and time periods for which only grouped data are available are still prevalent, and it can be advantageous to use grouped data for large scale regional and global projects. Our objective has been to summarize available techniques in a convenient form for researchers working along these lines.
Notes
- 1.
We will continue to refer to income distributions and income shares, but recognize that data are often for expenditure that can be treated in the same way.
- 2.
- 3.
- 4.
See Gastwirth (1971).
- 5.
The estimator in the first of these papers was described as a generalized method of moments estimator. Here, we use the term minimum distance estimator because it includes not only estimators that minimize the squared distance between sample and population moments, but also those that minimize the squared distance between sample quantities and their probability limits.
- 6.
To avoid introducing more notation to what is already a very substantial amount, we will continue to use \(\hat{\lambda }_{i}\) to denote the observed cumulative proportion of population, despite the fact that, in the current context, it is a non-random fixed quantity.
- 7.
See Chotikapanich (2008) for access to this literature.
References
Chotikapanich, D., Valenzuela, M. R., & Rao, D. S. P. (1997). Global and regional inequality in the distribution of income: Estimation with limited/incomplete data. Empirical Economics, 20, 533–546.
Chotikapanich, D., Griffiths, W. E., & Rao, D. S. P. (2007). Estimating and combining national income distributions using limited data. Journal of Business and Economic Statistics, 25, 97–109.
Chotikapanich, D. (Ed.). (2008). Modeling income distributions and Lorenz curves. Springer.
Chotikapanich, D., Griffiths, W. E., Rao, D. S. P., & Valencia, V. (2012). Global income distributions and inequality, 1993 and 2000: Incorporating country-level inequality modelled with beta sistributions. The Review of Economics and Statistics, 94, 52–73.
Chotikapanich, D., Griffiths, W. E., Karunarathne, W., & Rao, D. S. P. (2013). Calculating poverty measures from the generalized beta income distribution. Economic Record, 89(S1), 48–66.
David, H. A., & Nagaraja, H. N. (2003). Order statistics. Wiley.
Eckernkemper, T., & Gribisch, B. (2021). Classical and Bayesian inference for income distributions using grouped data. Oxford Bulletin of Economics and Statistics, 83, 32–65.
Gastwirth, J. L. (1971). A general definition of the Lorenz curve. Econometrica, 39, 1037–1039.
Griffiths, W. E., & Hajargasht, G. (2015). On GMM estimation of distributions from grouped data. Economics Letters, 126, 122–126.
Hajargasht, G., Griffiths, W. E., Brice, J., Rao, D. S. P., & Chotikapanich, D. (2012). Inference for income distributions using grouped data. Journal of Business and Economic Statistics, 30(4), 563–576.
Hajargasht, G., & Griffiths, W. E. (2020). Minimum distance estimation of parametric Lorenz curves based on grouped data. Econometric Reviews, 39(4), 344–361.
Hilomi, K., Liu, Q.-F., Nishiyama, Y., & Sueishi, N. (2008). Efficient estimation and model selection for grouped data with local moments. Journal of the Japan Statistical Society, 38, 131–143.
Kleiber, C., & Kotz, S. (2003). Statistical size distributions in economics and actuarial sciences. Wiley.
Sarabia, J. M., & Jordá, V. (2014). Explicit expressions of the Pietra Index for the generalized function for the size distribution of income. Physica a: Statistical Mechanics and Its Applications, 416, 582–595.
Sarabia, J. M., Jordá, V., & Remuzgo, L. (2017). The Theil indices in parametric families of income distributions—A short review. The Review of Income and Wealth, 63(4), 867–880.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Chotikapanich, D., Griffiths, W., Hajargasht, G. (2022). Modelling Income Distributions with Limited Data. In: Chotikapanich, D., Rambaldi, A.N., Rohde, N. (eds) Advances in Economic Measurement. Palgrave Macmillan, Singapore. https://doi.org/10.1007/978-981-19-2023-3_5
Download citation
DOI: https://doi.org/10.1007/978-981-19-2023-3_5
Published:
Publisher Name: Palgrave Macmillan, Singapore
Print ISBN: 978-981-19-2022-6
Online ISBN: 978-981-19-2023-3
eBook Packages: Economics and FinanceEconomics and Finance (R0)