Keywords

JEL codes

Introduction

It is generally recognized that poverty and excessive inequality are socially undesirable. Reducing global poverty so that fewer individuals are deprived of basic needs is a major objective of international agencies. While what constitutes too much inequality is debatable, there is concern about the negative effects of rising inequality on health, crime and other aspects of society. Also, in extreme cases, inequality has led to the overthrow of governments and changes in the international order. It is important, therefore, be able to monitor changes in inequality and poverty using suitable measurement techniques. For this purpose, modelling and estimation of income distributions and Lorenz curves play an important role. Data available for modelling and estimation can be available in many forms. They may come from taxation data or from a variety of surveys. We focus on modelling and estimation when the data are limited in the sense that they come in grouped form, typically as the proportion of total income allocated to each of a number of groups, ordered according to increasing income, and with a specified proportion of the population within each group. These so-called income and population shares form the basis for estimating inequality through the Lorenz curve.Footnote 1 When share data are combined with data on mean incomes, income distributions can also be estimated, and their relationship with Lorenz curves can be exploited.

Data in grouped form are often utilized for large scale projects where inequality and poverty on a regional or global scale are being measured, and where compilation and dissemination of data in a more disaggregated form would be overly resource intensive. An example of such a study is Chotikapanich et al. (2012). Examples of locations where grouped share data are available for researchers are the World Bank’s PovcalNet websiteFootnote 2 and that of the World Institute for Development Economic Research.Footnote 3

Our objective is to summarize methods for estimating parametric income distributions using grouped data, to specify the functions needed for estimation for a number of popular parametric forms, and to provide formulae that can be used to compute inequality and poverty measures from the parameters of each of the distributions. In section Concepts, we introduce notation and concepts to be utilized later in the paper. The density, distribution and moment distribution functions that play an important role are introduced, along with poverty and inequality measures whose values can be calculated from estimates of the parameters of income distributions. We also describe the nature of the data that we assume are available. Section Estimation is devoted to estimation. Choice of estimation technique is influenced by whether or not group bounds are provided in the available data and on how the data are grouped: fixed group bounds and random population proportions or fixed population proportions and random group bounds. Both minimum distance (MD) and maximum likelihood (ML) estimators are considered, and results are provided for variants of the MD estimators which depend on which “distance” is being minimized. In section Specification of Distributions, Inequality and Poverty Measures, we tabulate the common parametric distributions that have been used to model income distributions; their density, distribution and moment distribution functions, and moments, are provided. Expressions that can be used to calculate inequality measures from the parameters of the different distributions are also tabulated. Expressions for some poverty measures are given in section Concepts; those for the Watts poverty index are tabulated in section Specification of Distributions, Inequality and Poverty Measures. In large projects, involving many countries and many years, MD and ML estimation can be daunting tasks. In section Simple Recipes for Two Distributions, we describe two relatively simple estimators for two specific distributions: the lognormal and the Pareto-lognormal. Some concluding remarks follow in section Concluding Remarks.

Concepts

We assume a population of incomes \(y\), with \(y > 0\), can be represented by a probability density function (pdf) \(f(y;\theta )\) where \(\theta\) is a vector of unknown parameters. Our objective is to review several alternative functional forms that have been suggested for \(f(y;\theta )\), to describe methods for estimating \(\theta\) from grouped data, and to provide expressions that can be used to compute estimates of inequality and poverty measures from estimates for \(\theta\).

We further assume \(y\) has a finite mean \(\mu = \int\limits_{0}^{\infty } {y\,f(y;\theta )} \,dy.\) Its cumulative distribution function (cdf) will be denoted by

$$\lambda = F(y;\theta ) = \int\limits_{0}^{y} {f(t;\theta )dt}$$
(5.1)

and its first moment distribution function (fmdf) by

$$\eta = F^{{(1)}} (y;\theta ) = \frac{1}{\mu }\int\limits_{0}^{y} {t\,f(t;\theta )} \,dt$$
(5.2)

We will also utilize the second moment distribution function (smdf)

$$\psi {\mkern 1mu} = {\mkern 1mu} F^{{(2)}} {\mkern 1mu} (y;\theta ){\mkern 1mu} = {\mkern 1mu} \frac{1}{{\mu ^{{(2)}} }}\int\limits_{0}^{\infty } {t^{2} {\mkern 1mu} f{\mkern 1mu} (t;\theta ){\mkern 1mu} dt}$$
(5.3)

where \(\mu^{(2)}\) is the second moment \(\mu ^{{(2)}} \, = \,\int\limits_{0}^{\infty } {y^{2} \,f\,(y;\theta )\,dy.}\) The Lorenz curve, relating the cumulative proportion of income to the cumulative proportion of population, is given byFootnote 4

$$\eta \, = \,L(\lambda ;\theta )\, = \,F^{{(1)}} \,\left( {F^{{ - 1}} \,(\lambda ;\theta );\,\theta } \right)$$
(5.4)

When modelling begins with the specification of a Lorenz curve, the quantile function \(y\, = \,F^{{ - 1}} \,(\lambda ;\theta )\) can be found from it via differentiation,

$$y\, = \,F^{{ - 1}} \,(\lambda ;\theta )\, = \,\mu \frac{{dL\,(\lambda ;\theta )}}{{d\lambda }}$$
(5.5)

Inequality Measures

The most commonly cited inequality measure is the Gini coefficient \(g\) which is given by twice the area between the Lorenz curve and the line of equality where \(\eta = \lambda .\) That is,

$$\begin{aligned} g & \, = \,1 - 2\int\limits_{0}^{1} {L(\lambda ;\theta ){\kern 1pt} d\lambda } \\ & \, = \, - 1\, + \,\frac{2}{\mu }\int\limits_{0}^{\infty } {yF\left( {y;\theta } \right)f\left( {y;\theta } \right)dy} \\ \end{aligned}$$
(5.6)

Two further inequality measures that we consider are the Theil indices which are special cases of a generalized entropy class of measures. Unlike the Gini coefficient, members of this class have the advantage of being additively decomposable into population subgroups. The general class is given by

$${\text{GE}}(v)\, = \,\frac{1}{{v^{2} - v}}\,\left[ {\int\limits_{0}^{\infty } {\left( {\frac{y}{\mu }} \right)^{v} f(y;\theta ){\mkern 1mu} dy - 1} } \right]\quad \quad v \ne 0,\,1$$
(5.7)

The parameter \(v\) controls the sensitivity of the index to income differences in different parts of the income distribution; larger positive values imply greater sensitivity to income differences in the upper part of the distribution and more negative values imply greater sensitivity to differences in the lower part of the distribution. The Theil special cases are those for \(v \to 0\) and \(v \to 1.\) They are given by

$$T_{0} \, = \,{\text{GE}}(0)\, = \,\int\limits_{0}^{\infty } {\ln \,\left( {\frac{\mu }{y}} \right)f(y;\theta )\,dy}$$
(5.8)
$$T_{1} = GE(1) = \int\limits_{0}^{\infty } {\left( {\frac{y }{\mu}} \right) \ln \left( {\frac{y }{\mu}} \right)f(y;\theta )dy}$$
(5.9)

The last inequality measure that we consider is the Pietra index which is equal to the maximum distance between the Lorenz curve and the equality line \(\eta = \lambda .\) It can be written as the difference between the cdf and the fmdf, evaluated at \(\mu .\)

$$P = F(\mu ;\theta ) - F^{(1)} (\mu ;\theta )$$
(5.10)

Poverty Measures

Modelling and estimating income distributions are also useful for evaluating poverty. We consider four poverty measures, the headcount ratio \(HC,\) the poverty gap \(PG,\) the \(FGT\) index with the inequality aversion parameter set at 2 and the Watts index, \(WI.\) For convenience, we express \(HC,\) \(PG\) and \(FGT\) in terms of distribution and moment distribution functions, and moments, which are tabulated for specific distributions in section Specification of Distributions, Inequality and Poverty Measures. The Watts index requires more work, however; we defer specific parametric expressions for it until section Specification of Distributions, Inequality and Poverty Measures. Given a specific poverty line \(z,\) we have

$$H = F(z;\theta )$$
(5.11)
$${\text{PG}} = \int\limits_{0}^{z} {\left( {\frac{{z - y}}{z}} \right)\,f(y;\theta )\,dy} \, = \,F(z;\theta )\, - \,\frac{\mu }{z}F^{{(1)}} (z;\theta )$$
(5.12)
$$\begin{aligned} {\text{FGT}}\,(2) & \, = \,\int\limits_{0}^{z} {\left( {\frac{{z - y}}{z}} \right)^{2} f(y;\theta )\,dy} \\ & \, = \,F(z;\theta )\, - \,2\frac{\mu }{z}F^{{(1)}} (z;\theta ) + \frac{{\mu ^{{(2)}} }}{{z^{2} }}F^{{(2)}} (z;\theta ) \\ \end{aligned}$$
(5.13)
$${\text{WI}} = \int\limits_{0}^{z} {\left[ {\ln (z)\, - \,\ln (y)} \right]\,f(y;\theta )\,dy}$$
(5.14)

Data Setup

For estimating the various inequality and poverty measures, we assume we have a sample \(\user2{y^{\prime}} = (y_{1} ,y_{2} ,....,y_{T} )\) randomly drawn from \(f(y;\theta )\), and grouped into \(N\) income classes \((x_{0} ,x_{1} ),\) \((x_{1} ,x_{2} ), \ldots ,(x_{N - 1} ,x_{N} )\) with \(x_{0} = 0\) and \(x_{N} = \infty .\) We denote the proportion of observations in the i-th group as \(c_{i} ,\) mean income in the i-th group as \(\overline{y}_{i} ,\) and mean income for the whole sample as \(\overline{y}.\) The income share for the i-th group is \(s_{i} = c_{i} {{\overline{y}_{i} } \mathord{\left/ {\vphantom {{\overline{y}_{i} } {\overline{y}.}}} \right. \kern-\nulldelimiterspace} {\overline{y}.}}\) Sometimes observations \(\user2{c^{\prime}} = \left( {c_{1} ,c_{2} , \ldots ,c_{N} } \right)\) and \(\user2{s^{\prime}} = (s_{1} ,\,s_{2} , \ldots s_{N} )\) are available from one source and \(\overline{y}\) is available from another source, in which case group mean incomes can be found from \(\overline{y}_{i} = s_{i} {{\overline{y}} \mathord{\left/ {\vphantom {{\overline{y}} {c_{i} .}}} \right. \kern-\nulldelimiterspace} {c_{i} .}}\) In the next section, we describe various methods for estimating \(\theta ,\) given the observations \((c_{i} ,s_{i} ,\overline{y}).\)

Estimation

The estimation methods that we review can be categorized according to the way in which the data are generated, and whether the group bounds \(\user2{x^{\prime}} = (x_{0} ,x_{1} , \ldots ,x_{N} )\) are known in addition to the observations on \((c_{i} ,s_{i} ,\overline{y}).\) There are two ways in which the data can be generated. The group bounds x can be specified a priori, making the proportions of observations which fall into each group \(c_{i}\), and the group means \(\overline{y}_{i}\), the random variables. Alternatively, the \(c_{i}\) can be specified a priori, in which case the group bounds x are random variables, along with the group means \(\overline{y}_{i}\). We consider estimation techniques for each of these cases in turn, noting the implications of known and unknown values for the group boundaries.

Estimation with Fixed x, Random c, Random \(\overline{\user2{y}}_{{\varvec{i}}}\)

One approach for estimating \(\theta\) when the group bounds x are known and the \(c_{i}\) are random is to maximize the likelihood function for the multinomial distribution. This approach uses information on \({\varvec{x}}\) and \({\varvec{c}}\), but does not utilize the information contained in \({\varvec{s}}\) and \(\overline{y}.\) The log of the likelihood function is given by

$$L(\theta )\; \propto \;K + \sum\limits_{i = 1}^{N} {c_{i} \ln \left[ {F(x_{i} ;\theta ) - F(x_{i - 1} ;\theta )} \right]}$$
(5.15)

where \(K\) is a constant.

In a series of papers (Griffiths & Hajargasht, 2015; Hajargasht & Griffiths, 2020; Hajargasht et al., 2012), three minimum distance (MD) estimators suitable for random \(c_{i}\) and \(\overline{y}_{i}\) were introduced.Footnote 5 These estimators utilize information on \({\varvec{c}},{\varvec{s}}\) and \(\overline{y},\) and can be applied with or without knowledge of \({\varvec{x}}.\) When \({\varvec{x}}\) is unknown it can be treated as a set of unknown parameters and estimated along with \(\theta .\) The three estimators all have the same limiting distribution, but do not yield identical estimates. They are more efficient than the ML estimator from the multinomial likelihood function where only information from \(c_{i}\) is utilized. To introduce the three estimators, we begin by noting the following:

$${\text{plim}}\,c_{i} \, = \,F(x_{i} ;\theta ) - F(x_{{i - 1}} ;\theta )\, = \,\lambda _{i} (\phi ) - \lambda _{{i - 1}} (\phi )$$
(5.16)
$${\text{plim}}\,s_{i} \, = \,F^{{(1)}} (x_{i} ;\theta ) - F^{{(1)}} (x_{{i - 1}} ;\theta )\, = \,\eta _{i} (\phi ) - \eta _{{i - 1}} (\phi )$$
(5.17)

where we write \(\phi = (x,\theta )\) to accommodate the case where \({\varvec{x}}\) is unobserved, making the unknown parameter vector equal to \(\phi .\) If \({\varvec{x}}\) is observed, we can proceed in the same way, utilizing the known \({\varvec{x}}\) and treating \(\theta\) as the unknown parameter vector.

MD Estimator 1

For the first MD estimator, we define

$$\tilde{y}_{i} = s_{i} \overline{y} = c_{i} \overline{y}_{i}$$
(5.18)

Since \(\sum\nolimits_{i = 1}^{N} {\tilde{y}_{i} } = \sum\nolimits_{i = 1}^{N} {c_{i} \overline{y}_{i} } = \overline{y},\) we interpret \(\tilde{y}_{i}\) as that part of mean income \(\overline{y}\) that comes from the i-th group. Then, from (5.17) and (5.18),

$$\begin{aligned} \text{plim} \,\tilde{y}_{i} & \, = \,\text{plim} \,\bar{y}\,\text{plim} \,s_{i} \\ & \, = \,\mu \left[ {F^{{(1)}} (x_{i} ;\theta ) - F^{{(1)}} (x_{{i - 1}} ;\theta )} \right] \\ & \, = \,\mu \left[ {\eta _{i} (\phi ) - \eta _{{i - 1}} (\phi )} \right] \\ \end{aligned}$$
(5.19)

From (5.16) and (5.19), we can set up the MD estimator

$$\hat{\phi }_{1} = \arg \text{min}_{\phi} H_{1} (\phi )^{\prime}WH_{1} (\phi )$$
(5.20)

where

$$H_{1} (\phi ) = \left[ \begin{gathered} c_{1} - \left[ {\lambda_{1} (\phi ) - \lambda_{0} (\phi )} \right] \\ \vdots \\ c_{N - 1} - \left[ {\lambda_{N - 1} (\phi ) - \lambda_{N - 2} (\phi )} \right] \\ \tilde{y}_{1} - \mu \left[ {\eta_{1} (\phi ) - \eta_{0} (\phi )} \right] \\ \vdots \\ \tilde{y}_{N} - \mu \left[ {\eta_{N} (\phi ) - \eta_{N - 1} (\phi )} \right] \\ \end{gathered} \right]$$
(5.21)

and \(W\) is a weight matrix. Note that \(\mu\) will also depend on \(\phi ,\) the exact function depending on the parametric pdf chosen for the income distribution. Also, \(c_{N} - \left[ {\lambda_{N} (\phi ) - \lambda_{N - 1} (\phi )} \right]\) has been omitted since having \(\sum\nolimits_{i = 1}^{N} {c_{i} = 1}\) makes one of the \(c_{i}\) entries redundant.

A possible weight matrix, one suggested by Chotikapanich et al. (2007), is to set the diagonal elements of \(W\) as \(w_{i} = {1 \mathord{\left/ {\vphantom {1 {c_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {c_{i}^{2} }}\) for \(i = 1,2, \ldots ,N - 1\) and \(w_{N - 1 + i} = {1 \mathord{\left/ {\vphantom {1 {\tilde{y}_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {\tilde{y}_{i}^{2} }}\) for \(i = 1,2, \ldots ,N,\) and the off-diagonal elements to zero. With this setting \(\hat{\phi }_{1}\) minimizes the sum of squares of percentage errors. This weight matrix, call it \(W_{CGR} ,\) is a simple one, and it works well in practice, but it is not optimal; it does not lead to the most efficient estimator for \(\phi .\) Hajargasht et al. (2012) show that the inverse of the optimal weight matrix is given by

$$W_{1}^{ - 1} (\phi ) = \left[ {\begin{array}{*{20}c} {D_{1} } & {\left[ {\begin{array}{*{20}c} {D_{2} } & {0_{N - 1} } \\ \end{array} } \right]} \\ {\left[ {\begin{array}{*{20}c} {D_{2} } \\ {0^{\prime}_{N - 1} } \\ \end{array} } \right]} & {D_{3} } \\ \end{array} } \right] - \left[ {\begin{array}{*{20}c} {A_{1} } & {A_{2} } \\ {A^{\prime}_{2} } & {A_{3} } \\ \end{array} } \right]$$
(5.22)

where \(0_{N - 1}\) is an \((N - 1) -\) dimensional vector of zeros, and \(D_{1} ,D_{2}\) and \(D_{3}\) are diagonal matrices. Their elements, and those of \(A_{1} ,A_{2}\) and \(A_{3,}\) are as follows.

$$[D_{1} ]_{ii} = \lambda_{i} - \lambda_{i - 1} ,\quad \quad i = 1,2, \ldots ,N - 1$$
$$[D_{2} ]_{ii} = \mu \left( {\eta_{i} - \eta_{i - 1} } \right),\quad \quad i = 1,2, \ldots ,N - 1$$
$$[D_{3} ]_{ii} = \mu^{(2)} \left( {\psi_{i} - \psi_{i - 1} } \right),\quad \quad i = 1,2, \ldots ,N$$
$$\left[ {A_{1} } \right]_{ij} = \left( {\lambda_{i} - \lambda_{i - 1} } \right)\left( {\lambda_{j} - \lambda_{j - 1} } \right),\quad \quad i,j = 1,2, \ldots ,N - 1$$
$$\left[ {A_{2} } \right]_{ij} = \left( {\lambda_{i} - \lambda_{i - 1} } \right)\left( {\eta_{j} - \eta_{j - 1} } \right)\quad \quad i = 1,2, \ldots ,N - 1;\;j = 1,2, \ldots ,N$$
$$\left[ {A_{3} } \right]_{ij} = \left( {\eta_{i} - \eta_{i - 1} } \right)\left( {\eta_{j} - \eta_{j - 1} } \right),\quad \quad i,j = 1,2, \ldots ,N$$

All these quantities depend on the unknown parameter vector \({\mathbf{\phi }}.\) To ease the notation, we have not made this dependence explicit. Note also that, through \(D_{3} ,\) \(W\) will depend on the second moment \(\mu^{(2)}\) and the second moment distribution function \(\psi_{i} = F^{(2)} (x_{i} ;\theta ).\)

After inverting \(W_{1}^{ - 1}\) to find \(W_{1} ,\) and simplifying, the objective function in (5.20) can be shown to be equal to

$$\begin{aligned} H_{1} (\phi )^{'} W_{1} (\phi )H_{1} (\phi ) & \, = \,\sum\limits_{{i = 1}}^{N} {w_{{1i}} \left[ {c_{i} - (\lambda _{i} - \lambda _{{i - 1}} )} \right]^{2} } \\ & \,\,\,\,\, + \,\sum\limits_{{i = 1}}^{N} {w_{{2i}} \left[ {\tilde{y}_{i} - \mu (\eta _{i} - \eta _{{i - 1}} )} \right]^{2} } \\ & \,\,\,\,\, - \,2\sum\limits_{{i = 1}}^{N} {w_{{3i}} \left[ {c_{i} - (\lambda _{i} - \lambda _{{i - 1}} )} \right]} \left[ {\tilde{y}_{i} - \mu (\eta _{i} - \eta _{{i - 1}} )} \right] \\ \end{aligned}$$
(5.23)

where

$$w_{1i} = \frac{{\mu^{(2)} \left( {\psi_{i} - \psi_{i - 1} } \right)}}{{v_{i} }}$$
(5.24)
$$w_{2i} = \frac{{\left( {\lambda_{i} - \lambda_{i - 1} } \right)}}{{v_{i} }}$$
(5.25)
$$w_{3i} = \frac{{\mu \left( {\eta_{i} - \eta_{i - 1} } \right)}}{{v_{i} }}$$
(5.26)

and

$$v_{i} = \mu^{(2)} (\lambda_{i} - \lambda_{i - 1} )(\psi_{i} - \psi_{i - 1} ) - \mu^{2} (\eta_{i} - \eta_{i - 1} )^{2}$$

There are three possible ways to approach the problem of finding an estimate \(\hat{\phi }\) that minimizes \(H_{1} (\phi )^{\prime}W_{1} (\phi )H_{1} (\phi ):\)

  1. 1.

    A two-step estimator where first an estimate \(\hat{\phi }_{CGR}\) is obtained using the weight matrix \(W_{CGR} ,\) and then a second estimate \(\hat{\phi }_{2 - STEP}\) is obtained by minimizing \(H_{1} (\phi )^{\prime}W_{1} (\hat{\phi }_{CGR} )H_{1} (\phi ).\)

  2. 2.

    An iterative estimator obtained by iterating the 2-step estimator until convergence is achieved.

  3. 3.

    A “continuous updating estimator” where the whole function in (5.23) is minimized with respect to \(\phi .\)

These three estimators all have the same limiting distribution but can produce different estimates. Their asymptotic covariance matrix is

$${\text{var}}\,\left( {\hat{\phi }_{1} } \right) = \frac{1}{T}\,\left[ {\left( {\frac{{\partial H_{1}^{{*'}} }}{{\partial \phi }}} \right)W_{1}^{*} \left( {\frac{{\partial H_{1}^{*} }}{{\partial \phi ^{'} }}} \right)} \right]^{{ - 1}}$$
(5.27)

where \(H_{1}^{*}\) is a \((2N \times 1)\) vector obtained from \(H_{1}\) by including \(c_{N} - \left( {\lambda_{N} - \lambda_{N - 1} } \right)\) in the N-th position, and \(W_{1}^{*}\) is a \((2N \times 2N)\) matrix with 4 \((N \times N)\) diagonal blocks \(D_{11} ,\;D_{12} ,\;D_{21} = D_{12}\) and \(D_{22} .\) The i-th diagonal elements of these matrices are \(w_{1i}\) for \(D_{11} ,\) \(w_{2i}\) for \(D_{22}\) and \(- w_{3i}\) for \(D_{12} .\) See Eqs. (5.24) to (5.26).

MD Estimator 2

The second MD estimator is that considered by Griffiths and Hajargasht (2015). It follows the same principles as the previous one, but it replaces \(\tilde{y}_{i}\) by \(\overline{y}_{i} .\) To accommodate this replacement, we note that, from (5.16)–(5.18),

$$\begin{aligned} \text{plim} \,\bar{y}_{i} & \, = \,\frac{{\text{plim} \,\bar{y}\,\text{plim} \,s_{i} }}{{\text{plim} \,c_{i} }} \\ & \, = \,\frac{{\mu \left( {\eta _{i} - \eta _{{i - 1}} } \right)}}{{\lambda _{i} - \lambda _{{i - 1}} }} \\ \end{aligned}$$

In this case, the MD estimator can be written as

$$\hat{\phi }_{2} = \arg \text{min}_{\phi} H_{2} (\phi )^{\prime}W_{2} H_{2} (\phi )$$
(5.28)

where

$$H_{2} (\phi ) = \left[ \begin{gathered} c_{1} - \left[ {\lambda_{1} - \lambda_{0} } \right] \\ \vdots \\ c_{N - 1} - \left[ {\lambda_{N - 1} - \lambda_{N - 2} } \right] \\ \overline{y}_{1} - \frac{{\mu \left[ {\eta_{1} - \eta_{0} } \right]}}{{\lambda_{1} - \lambda_{0} }} \\ \vdots \\ \overline{y}_{N} - \frac{{\mu \left[ {\eta_{N} - \eta_{N - 1} } \right]}}{{\lambda_{N} - \lambda_{N - 1} }} \\ \end{gathered} \right]$$
(5.29)

and \(W_{2}\) is a specified weight matrix. The weight matrix that is analogous to \(W_{CGR} ,\) suggested for the previous estimator as a simple choice, or as a starting point for estimators that use an optimal weight matrix, is a diagonal matrix with elements \(w_{i} = {1 \mathord{\left/ {\vphantom {1 {c_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {c_{i}^{2} }}\) for \(i = 1,2, \ldots ,N - 1,\) and \(w_{i + N - 1} = {1 \mathord{\left/ {\vphantom {1 {\overline{y}_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {\overline{y}_{i}^{2} }}\) for \(i = 1,2, \ldots ,N.\) Griffiths and Hajargasht (2015) show that the optimal weight matrix, for use with a 2-step, iterative or continuous updating estimator, is given by

$$W_{2} (\phi ) = \left[ {\begin{array}{*{20}c} {E_{1} } & 0 \\ 0 & {E_{2} } \\ \end{array} } \right]$$
(5.30)

where

$$\left[ {E_{1} } \right]_{{ij}} \, = \,\frac{{\delta _{{ij}} }}{{\lambda _{i} - \lambda _{{i - 1}} }}\, + \,\frac{1}{{\lambda _{N} - \lambda _{{N - 1}} }}\,\,\,\,\,\,\,\,i,j = 1,\,2,\, \ldots ,\,N - 1$$
(5.31)
$$\left[ {E_{2} } \right]_{{ij}} \, = \,\frac{{\delta _{{ij}} \left( {\lambda _{i} - \lambda _{{i - 1}} } \right)^{3} }}{{\mu ^{{(2)}} \left( {\lambda _{i} - \lambda _{{i - 1}} } \right)\left( {\psi _{i} - \psi _{{i - 1}} } \right) - \mu ^{2} \left( {\eta _{i} - \eta _{{i - 1}} } \right)^{2} }}\,\,\,\,i,j = 1,\,2,\, \ldots ,\,N$$
(5.32)

and \(\delta_{ij} = 1\) when \(i = j\) and \(\delta_{ij} = 0\) when \(i \ne j.\) Using these results, the objective function can be simplified to

$$\begin{aligned} H_{2} (\phi )^{'} W_{2} (\phi )H_{2} (\phi )\, & = \,\sum\limits_{{i = 1}}^{N} {\frac{{\left[ {c_{i} - (\lambda _{i} - \lambda _{{i - 1}} )} \right]^{2} }}{{\lambda _{i} - \lambda _{{i - 1}} }}} \\ \, & \,\,\,\,\,\, + \,\sum\limits_{{i = 1}}^{N} {\left[ {E_{2} } \right]_{{ii}} \left( {\bar{y}_{i} - \frac{{\mu (\eta _{i} - \eta _{{i - 1}} )}}{{\lambda _{i} - \lambda _{{i - 1}} }}} \right)} ^{2} \\ \end{aligned}$$
(5.33)

As before, \(H_{2} (\phi )^{\prime}W_{2} (\phi )H_{2} (\phi )\) can be minimized using a 2-step estimator, an iterative estimator or a continuous updating estimator. The weights are \({1 \mathord{\left/ {\vphantom {1 {(\lambda_{i} - \lambda_{i - 1} )}}} \right. \kern-\nulldelimiterspace} {(\lambda_{i} - \lambda_{i - 1} )}}\) for the first terms in (5.33) and \(\left[ {E_{2} } \right]_{ii}\) for the second. In contrast to the earlier formulation in (5.23), there are no cross product terms, making the minimization problem simpler and convergence easier to obtain. The large sample covariance matrix of an estimator \(\hat{\phi }_{2}\) using an optimal weight matrix is

$${\text{var}}\,\left( {\hat{\phi }_{1} } \right) = \frac{1}{T}\,\left[ {\left( {\frac{{\partial H_{1}^{{*'}} }}{{\partial \phi }}} \right)W_{1}^{*} \left( {\frac{{\partial H_{1}^{*} }}{{\partial \phi ^{'} }}} \right)} \right]^{{ - 1}}$$
(5.34)

where \(H_{2}^{*}\) is a \((2N \times 1)\) vector obtained from \(H_{2}\) by including \(c_{N} - (\lambda_{N} - \lambda_{N - 1} )\) in the N-th position, and \(W_{2}^{*}\) is a \((2N \times 2N)\) block-diagonal matrix with elements \({1 \mathord{\left/ {\vphantom {1 {(\lambda_{i} - \lambda_{i - 1} )}}} \right. \kern-\nulldelimiterspace} {(\lambda_{i} - \lambda_{i - 1} )}}\) in the first diagonal block and elements \(\left[ {E_{2} } \right]_{ii}\) in the second diagonal block.

MD Estimator 3

The third MD estimator that we describe is that considered by Hajargasht and Griffiths (2020). Its essential difference is that it considers cumulative population and income shares. To develop it, we begin by defining.

$$\hat{\lambda }_{i} = \sum\limits_{{j = 1}}^{i} {c_{j} } \,{\text{and}}\,\hat{\eta }_{i} = \sum\limits_{{j = 1}}^{i} {s_{j} }$$
(5.35)

.

and recognizing that

$$\text{plim} \,\hat{\lambda }_{i} \, = \,F\left( {x_{i} ;\theta } \right)\, = \,\lambda _{i} (\phi )$$
(5.36)
$$\text{plim} \,\bar{y}\,\hat{\eta }_{i} \, = \,\mu \,F^{{(1)}} \left( {x_{i} ;\theta } \right)\, = \,\mu {\kern 1pt} \eta _{i} (\phi )$$
(5.37)

Using (5.36) and (5.37), we can construct the MD estimator as

$$\hat{\phi }_{3} = \arg \text{min}_{\phi } H_{3} (\phi )^{\prime}W_{3} H_{3} (\phi )$$
(5.38)

where

$$H_{3} (\phi ) = \left[ \begin{gathered} \hat{\lambda }_{1} - \lambda_{1} \\ \vdots \\ \hat{\lambda }_{N - 1} - \lambda_{N - 1} \\ \overline{y}\,\hat{\eta }_{1} - \mu \,\eta_{1} \\ \vdots \\ \overline{y}\,\hat{\eta }_{N - 1} - \mu \,\eta_{N - 1} \\ \overline{y} - \mu \\ \end{gathered} \right]$$
(5.39)

and \(W_{3}\) is a pre-specified weight matrix. A simple weight matrix that can be used to simplify calculations or as a starting point for estimators that use an optimal weight matrix is a diagonal matrix with elements \(w_{i} = {1 \mathord{\left/ {\vphantom {1 {\hat{\lambda }_{i}^{2} }}} \right. \kern-\nulldelimiterspace} {\hat{\lambda }_{i}^{2} }}\) for \(i = 1,2, \ldots ,N - 1\) and \(w_{N - 1 + i} = {1 \mathord{\left/ {\vphantom {1 {\left( {\overline{y}\,\hat{\eta }_{i} } \right)^{2} }}} \right. \kern-\nulldelimiterspace} {\left( {\overline{y}\,\hat{\eta }_{i} } \right)^{2} }}\) for \(i = 1,2, \ldots ,N.\) Hajargasht and Griffiths (2020) show that the optimal weight matrix is given by

$$W_{3} (\phi ) = \left[ {\begin{array}{*{20}c} {L_{11} } & {L_{12} } \\ {L^{\prime}_{12} } & {L_{22} } \\ \end{array} } \right]$$
(5.40)

where

  1. 1.

    \(L_{11}\) is a \(\left[ {(N - 1) \times (N - 1)} \right]\) tri-diagonal matrix with the following nonzero elements:

    $$\begin{gathered} \left[ {L_{{11}} } \right]_{{ii}} = \frac{{\mu ^{{(2)}} \left( {\psi _{{i + 1}} - \psi _{i} } \right)}}{{v_{{i + 1}} }} + \frac{{\mu ^{{(2)}} \left( {\psi _{i} - \psi _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,i = 1,2, \ldots ,N - 1 \hfill \\ \left[ {L_{{11}} } \right]_{{ij}} = \left\{ \begin{gathered} - \frac{{\mu ^{{(2)}} \left( {\psi _{i} - \psi _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,\,\,\,\,i = 2,\,3,\, \ldots ,\,N - 1;\,j\, = \,i - 1 \hfill \\ - \frac{{\mu ^{{(2)}} \left( {\psi _{j} - \psi _{{j - 1}} } \right)}}{{v_{j} }}\,\,\,\,\,\,\,\,{\text{ }}j\, = \,2,\,3,\, \ldots ,\,N - 1;\,i\, = \,j - 1 \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered}$$
    (5.41)
  2. 2.

    \(L_{12}\) is a \(\left[ {(N - 1) \times N} \right]\) matrix with the following nonzero elements:

    $$\begin{gathered} \left[ {L_{{12}} } \right]_{{ii}} \, = \, - \frac{{\mu \left( {\eta _{{i + 1}} - \eta _{i} } \right)}}{{v_{{i + 1}} }} - \frac{{\mu \left( {\eta _{i} - \eta _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,i = 1,\,2,\, \ldots ,\,N - 1 \hfill \\ \left[ {L_{{12}} } \right]_{{ij}} = \left\{ \begin{gathered} \frac{{\mu \left( {\eta _{i} - \eta _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,\,\,\,\,i\, = \,2,\,3,\, \ldots ,\,N - 1;\,j\, = \,i - 1 \hfill \\ \frac{{\mu \left( {\eta _{j} - \eta _{{j - 1}} } \right)}}{{v_{j} }}\,\,\,\,\,\,\,\,{\text{ }}j\, = \,2,\,3,\, \ldots ,\,N;\,i = j - 1 \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered}$$
    (5.42)
  3. 3.

    \(L_{22}\) is a \(\left[ {N \times N} \right]\) tri-diagonal matrix with the following nonzero elements

    $$\begin{gathered} \left[ {L_{{22}} } \right]_{{ii}} \, = \,\frac{{\left( {\lambda _{{i + 1}} - \lambda _{i} } \right)}}{{v_{{i + 1}} }} + \frac{{\left( {\lambda _{i} - \lambda _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,i\, = \,1,\,2,\, \ldots ,\,N - 1 \hfill \\ \left[ {L_{{22}} } \right]_{{ij}} \, = \,\left\{ \begin{gathered} \frac{{ - \left( {\lambda _{i} - \lambda _{{i - 1}} } \right)}}{{v_{i} }}\,\,\,\,\,\,\,\,i\, = \,2,\,3,\, \ldots ,\,N;\,\,j\, = \,i - 1 \hfill \\ \frac{{ - \left( {\lambda _{j} - \lambda _{{j - 1}} } \right)}}{{v_{j} }}\,\,\,\,\,\,\,\,\,{\text{ }}j\, = \,2,\,3,\, \ldots ,\,N;\,i\, = \,j - 1 \hfill \\ \end{gathered} \right. \hfill \\ \left[ {L_{{22}} } \right]_{{NN}} \, = \,\frac{{ - \left( {\lambda _{N} - \lambda _{{N - 1}} } \right)}}{{v_{N} }} \hfill \\ \end{gathered}$$
    (5.43)

As in the previous two cases, the objective function can be minimized using a two-step estimator, an iterative estimator or a continuous updating estimator. The asymptotic covariance matrix for \(\hat{\phi }_{3} ,\) when using an optimal covariance matrix, is

$${\text{var}} (\hat{\phi }_{3} ) = \frac{1}{T}\left[ {\left( {\frac{{\partial H_{3}^{\prime } }}{\partial \phi }} \right)W_{3} \left( {\frac{{\partial H_{3} }}{{\partial \phi^{\prime}}}} \right)} \right]^{ - 1}$$
(5.44)

A Quasi ML Estimator

Building on the work of Hilomi et al. (2008), Eckernkemper and Gribisch (2021), propose a quasi ML estimator. They combine the multinomial likelihood in Eq. (5.15) with a Gaussian approximation for the group means \(\overline{y}_{i} .\) Including the extra information means that estimation can proceed with or without knowledge of the group bounds, with these bounds being treated as parameters to be estimated when they are unknown. Let \(T_{i} = c_{i} T\) be the number of observations in groups \(i.\) Each \(\overline{y}_{i}\) is assumed to be \(N\left( {\tilde{\mu }_{i} ,{{\tilde{\sigma }_{i}^{2} } \mathord{\left/ {\vphantom {{\tilde{\sigma }_{i}^{2} } {T_{i} }}} \right. \kern-\nulldelimiterspace} {T_{i} }}} \right)\) where the \(\tilde{\mu }_{i}\) and the \(\tilde{\sigma }_{i}^{2}\) are the means and variances of \(y\) from truncations \(\left( {x_{i - 1} < y_{i} < x_{i} } \right)\) of the originally specified distribution. That is,

$$\tilde{\mu }_{i} = E\left( {y|x_{i - 1} < y < x_{i} } \right) = \frac{{\mu \left[ {\eta_{i} (\phi ) - \eta_{i - 1} (\phi )} \right]}}{{\lambda_{i} (\phi ) - \lambda_{i - 1} (\phi )}}$$
(5.45)

and

$$\tilde{\sigma }_{i}^{2} = {\text{var}} \left( {y|x_{i - 1} < y < x_{i} } \right) = \frac{{\mu^{(2)} \left[ {\psi_{i} (\phi ) - \psi_{i - 1} (\phi )} \right]}}{{\lambda_{i} (\phi ) - \lambda_{i - 1} (\phi )}}\; - \;\tilde{\mu }_{i}^{2}$$
(5.46)

Using these results, the log of the likelihood function can be written as

$$L(\phi )\,\,\, \propto \,\,\,\,K_{1} + \sum\limits_{i = 1}^{N} {\left\{ {c_{i} \ln \left[ {\lambda_{i} (\phi ) - \lambda_{i - 1} (\phi )} \right] - \ln \tilde{\sigma }_{i} - \frac{{c_{i} }}{{2\tilde{\sigma }_{i}^{2} }}\left( {\overline{y} - \tilde{\mu }_{i} } \right)^{2} } \right\}}$$
(5.47)

Eckernkemper and Gribisch (2021) show that the estimator for \(\phi\) that maximizes \(L(\phi )\) is consistent and that the covariance matrix of its limiting distribution is the same as that for MD estimators 1 and 2.

Estimation with Fixed \({\varvec{c}},\) Random \({\varvec{x}},\) Random \(\overline{\user2{y}}_{{\varvec{i}}}\)

In this case, the observations are grouped such that the proportion of observations in each group is pre-specified. Examples are 10 groups with 10% of the observations in each group or 20 groups with 5% of the observations in each group. This setup implies the proportion \(c_{i}\) are fixed (non-random) and the sample group boundaries \({\varvec{x}}\) as well as the average cumulative incomes \(\overline{y}_{i}\) are random variables. Let \(y_{[1]} ,y_{[2]} , \ldots ,y_{[T]}\) be the order statistics obtained by arranging the original observations \({\varvec{y}}\) in ascending order. An estimate for a group bound \(x_{i}\) is the largest order statistic in the i-th group, \(\hat{x}_{i} = y_{{[\hat{\lambda }_{i} T]}} .\) If the \(\hat{x}_{i}\) are observed, estimation can use both the \(\hat{x}_{i}\) and the \(\overline{y}_{i} ;\) if the \(\hat{x}_{i}\) are unobserved, then only the information in \(\overline{y}_{i}\) can be utilized. We consider MD and ML estimation for both these cases. MD estimation with unobserved \(\hat{x}_{i}\) corresponds to Lorenz curve estimation which has attracted a great deal of attention in the literature. See, for example, Chotikapanich (2008). A Lorenz curve implied by a specific income distribution is defined by Eq. (5.4). An alternative is to start with a specific parametric Lorenz curve in which case the corresponding income distribution is defined via the quantile function in (5.5). A problem with the latter approach is that the income distributions corresponding to some Lorenz curves are not defined for all values of \(y.\)

MD Estimation

The MD estimators that we consider are those proposed by Hajargasht and Griffiths (2020). Suppose, in the first instance, that the \(\hat{x}_{i}\) are observed. To use this information in an MD estimator, we recognize thatFootnote 6

$$\text{plim} \,\hat{x}_{i} \, = \,F^{{ - 1}} \left( {\hat{\lambda }_{i} ;\theta } \right)$$
(5.48)

To use information on the income shares, we use the cumulative shares \(\hat{\eta }_{i}\) multiplied by mean income \(\overline{y},\) in line with MD estimator 3 for the random \({\varvec{c}}\) case. One difference, however, is that we express its probability limit in terms of the non-random \({\varvec{c}}\), instead of \({\varvec{x}}\), which is now a random variable. That is,

$$\text{plim} \,\bar{y}\,\hat{\eta }_{i} \, = \,\mu \,F^{{(1)}} \left( {F^{{ - 1}} (\hat{\lambda };\theta );\theta } \right)$$
(5.49)

To set up the MD estimator, it is convenient to define notation for a generalized Lorenz curve which can be written as

$$\mu \,\eta = G(\lambda ;\theta ) = \mu \,L(\lambda ;\theta ) = \mu \,F^{(1)} \left( {F^{ - 1} (\lambda ;\theta );\theta } \right)$$
(5.50)

Then, from (5.48)–(5.50), we can set up the following MD estimator,

$$\hat{\theta }_{4} = \arg \text{min}_{\theta } H^{\prime}_{4} (\theta )W_{4} H_{4} (\theta )$$
(5.51)

where

$$H_{4} (\theta ) = \left[ \begin{gathered} \hat{x}_{1} - F^{ - 1} (\hat{\lambda }_{1} ;\theta ) \\ \hat{x}_{2} - F^{ - 1} (\hat{\lambda }_{2} ;\theta ) \\ \vdots \\ \hat{x}_{N - 1} - F^{ - 1} (\hat{\lambda }_{N - 1} ;\theta ) \\ \overline{y}\,\hat{\eta }_{1} - G(\hat{\lambda }_{1} ;\theta ) \\ \vdots \\ \overline{y}\,\hat{\eta }_{N - 1} - G(\hat{\lambda }_{N - 1} ;\theta ) \\ \overline{y} - \mu \\ \end{gathered} \right]$$
(5.52)

and \(W_{4}\) is a suitable chosen weight matrix. It can be shown that the optimal weight matrix is given by

$$W_{4} (\theta ) = \left[ {\begin{array}{*{20}c} {\Omega_{11} } & {\Omega_{12} } \\ {\Omega^{\prime}_{12} } & {\Omega_{22} } \\ \end{array} } \right]^{ - 1}$$
(5.53)

where

$$\,\,\,\,\,\mathop {[\Omega_{11} ]_{ij} }\limits_{[(N - 1) \times (N - 1)]} = \left\{ \begin{gathered} \frac{{\hat{\lambda }_{i} (1 - \hat{\lambda }_{j} )}}{{f(\hat{x}_{i} )f(\hat{x}_{j} )}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i \le j \hfill \\ \frac{{\hat{\lambda }_{j} (1 - \hat{\lambda }_{i} )}}{{f(\hat{x}_{i} )f(\hat{x}_{j} )}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,j \le i \hfill \\ \end{gathered} \right.$$
(5.54)
$$\mathop {[\Omega _{{22}} ]_{{ij}} }\limits_{{[N \times N]}} \, = \,\left\{ \begin{gathered} \mu ^{{(2)}} \psi _{i} \, + \,\left[ {\hat{\lambda }_{i} \hat{x}_{i} \, - \,G(\hat{\lambda }_{i} )} \right]\,\left[ {\hat{x}_{j} \, - \,\hat{\lambda }_{j} \hat{x}_{j} \, + \,G(\hat{\lambda }_{j} )} \right] \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, - \,\hat{x}_{i} G(\hat{\lambda }_{i} )\,\,\,\,i \le j \hfill \\ \mu ^{{(2)}} \psi _{j} \, + \,\left[ {\hat{\lambda }_{j} \hat{x}_{j} \, - \,G(\hat{\lambda }_{j} )} \right]\,\left[ {\hat{x}_{i} \, - \,\hat{\lambda }_{i} \hat{x}_{i} \, + \,G(\hat{\lambda }_{i} )} \right] \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, - \,\hat{x}_{j} G(\hat{\lambda }_{j} )\,\,\,\,j \le i \hfill \\ \end{gathered} \right.$$
(5.55)
$$\,\,\,\,\,\mathop {[\Omega_{12} ]_{ij} }\limits_{[(N - 1) \times N]} = \left\{ \begin{gathered} \frac{{\hat{\lambda }_{i} [G(\hat{\lambda }_{j} ) - \hat{x}_{j} \hat{\lambda }_{j} + \hat{x}_{j} ] - G(\hat{\lambda }_{i} )}}{{f(\hat{x}_{i} )}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i \le j \hfill \\ \frac{{[\hat{\lambda }_{i} - 1][G(\hat{\lambda }_{j} ) - \hat{x}_{j} \hat{\lambda }_{j} ]}}{{f(\hat{x}_{i} )}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,j \le i \hfill \\ \end{gathered} \right.$$
(5.56)

The covariance matrix for the limiting distribution of \(\hat{\theta }_{4}\) is

$${\text{var}} (\hat{\theta }_{4} ) = \frac{1}{T}\left( {\frac{{\partial H^{\prime}_{4} }}{\partial \,\theta }W_{4} \frac{{\partial H_{4} }}{{\partial \,\theta^{\prime}}}} \right)^{ - 1}$$
(5.57)

When there are a large number of groups, the matrix inversion in (5.53) can be computationally demanding. Hajargasht and Griffiths (2020) show how \(W_{4}^{ - 1}\) can be derived from \(W_{3}^{ - 1}\) which has computationally convenient tri-diagonal blocks. They also demonstrate that, if the groupings for this set up are equivalent to those for the MD3 setup in the sense that, a priori, \(x_{i} = F^{ - 1} (\lambda_{i} ;\theta ),\) then the asymptotic covariance matrices for \(\hat{\theta }_{3}\) and \(\hat{\theta }_{4}\) are identical.

Minimizing (5.51) to find an estimate \(\hat{\theta }_{4}\) can proceed using one of the three algorithms described in section Estimation with Fixed x, Random c, Random \(\overline{\user2{y}}_{{\varvec{i}}}\). However, there are two requirements which will not always be met: estimates of the bounds \(\hat{x}_{i} = y_{{[\hat{\lambda }_{i} T]}}\) must be observed and the cdf must be invertible, either algebraically or computationally, so that quantiles \(F^{ - 1} (\hat{\lambda }_{i} ;\theta )\) can be found. Note that \(F^{ - 1} (\hat{\lambda }_{i} ;\theta )\) appears not only in the first \((N - 1)\) elements of \(H_{4}\) but also in the next \((N - 1)\) elements that involve the generalized Lorenz curve \(G(\lambda ;\theta ) = \mu \,F^{(1)} \left( {F^{ - 1} (\lambda ;\theta );\theta } \right).\) One way to overcome non-invertibility of the cdf is to replace the assumption of a parametric income distribution with an assumption of a parametric Lorenz curve. Doing so overcomes the problem for the second set of elements in \(H_{4} ,\) and relationships between the generalized Lorenz curve and the quantile function—see Hajargasht and Griffiths (2020)—can be exploited to obtain the first set of elements in \(H_{4} .\)

When the \(\hat{x}_{i}\) are unobserved, estimation can proceed using the last \(N\) elements in \(H_{4} ,\) with calculations made from an assumed income distribution if the cdf is invertible, or from an assumed Lorenz curve if the cdf is not invertible. This last approach is that most closely aligned with suggestions for Lorenz curve estimation which have appeared in the literature.Footnote 7 Earlier suggestions are sub-optimal in the sense that they do not use the best weighting matrix. Details can be found in Hajargasht and Griffiths (2020).

ML Estimation

ML estimation of \(\theta\) for fixed \(c_{i} ,\) and random \(x_{i}\) and \(\overline{y}_{i}\) was considered by Eckernkemper and Gribisch (2021). Recognizing that the joint density for the group bounds and group means can be written as

$$\begin{aligned} f\left( {\bar{y}_{1} ,\bar{y}_{2} , \ldots ,\bar{y}_{N} ,\user2{\hat{x}}} \right) & \, = \,f\left( {\bar{y}_{1} ,\bar{y}_{2} , \ldots ,\bar{y}_{N} \user2{|\hat{x}}} \right)f(\hat{x}_{1} ) \\ \, & \quad \quad \quad f\left( {\hat{x}_{2} |\hat{x}_{1} } \right) \ldots f\left( {\hat{x}_{{N - 1}} |\hat{x}_{{N - 2}} } \right) \\ \end{aligned}$$
(5.58)

they set up a likelihood function that uses distribution theory for order statistics for \(f(\hat{\user2{x}})\) and a Gaussian approximation for \(f\left( {\overline{y}_{i} |\hat{x}_{i} ,\hat{x}_{i - 1} } \right).\) Using results in David and Nagaraji (2003), the conditional means and variances for the \(\overline{y}_{i}\) can be written as

$$\mu_{i} = E\left( {\overline{y}_{i} |\hat{x}_{i} ,\hat{x}_{i - 1} } \right) = \frac{{T_{i} - 1}}{{T_{i} }}\tilde{\mu }_{i} + \frac{{\hat{x}_{i} }}{{T_{i} }}$$
(5.59)

and

$$\sigma_{i}^{2} = {\text{var}} \left( {\overline{y}_{i} |\hat{x}_{i} ,\hat{x}_{i - 1} } \right) = \frac{{T_{i} - 1}}{{T_{i}^{2} }}\tilde{\sigma }_{i}^{2}$$
(5.60)

The log-likelihood is

$$\begin{aligned} L(\theta ) & \, = \,K_{2} - \frac{1}{2}\left[ {\ln \tilde{\sigma }_{N}^{2} + \frac{{T_{N} }}{{\tilde{\sigma }_{N}^{2} }}\left( {\bar{y}_{N} - \tilde{\mu }_{N} } \right)^{2} } \right] + T_{N} \ln \left[ {1 - F\left( {\hat{x}_{{N - 1}} ;\theta } \right)} \right] \\ {\text{ }} & \,\,\,\,\,\,\,\,\, + \,\sum\limits_{{i = 1}}^{{N - 1}} {\left\{ \begin{gathered} - \frac{1}{2}\left[ {\ln \sigma _{i}^{2} + \left( {\frac{{\bar{y}_{i} - \mu _{i} }}{{\sigma _{i} }}} \right)^{2} } \right]\, \hfill \\ + \,(T_{i} - 1)\ln \left[ {F\left( {\hat{x}_{i} ;\theta } \right) - F\left( {\hat{x}_{{i - 1}} ;\theta } \right)} \right] + \ln f\left( {\hat{x}_{i} ;\theta } \right) \hfill \\ \end{gathered} \right\}} \\ \end{aligned}$$
(5.61)

where \(K_{2}\) is a constant. The estimator \(\hat{\theta }_{5}\) that minimizes (5.61) can be interpreted as a quasi ML estimator. Eckernkemper and Gribisch (2021) establish its asymptotic covariance matrix as

$${\text{var}} (\hat{\theta }_{5} ) = \frac{1}{T}\left[ {\sum\limits_{i = 1}^{N} {\left( {\frac{{\partial \mu_{i} }}{\partial \theta }\frac{{\partial \mu_{i} }}{{\partial \theta^{\prime}}}} \right)\left( {\frac{{c_{i} }}{{\tilde{\sigma }_{i}^{2} }}} \right)} } \right]^{ - 1}$$
(5.62)

One difference between the estimator \(\hat{\theta }_{5}\) and those estimators considered in the earlier sections is that it requires knowledge of the sample size \(T,\) from which the number of observations in each group can be found from \(T_{i} = c_{i} T.\) All estimators require knowledge of \(T\) to compute standard errors, but knowledge of the proportions \(c_{i} ,\) without knowledge of \(T,\) is sufficient for the earlier estimators for \(\theta\) and \(\phi\) to be employed.

For ML estimation of \(\theta\) when the \(\hat{x}_{i}\) are not observed, Eckernkemper and Gribisch (2021) integrate out the \(\hat{x}_{i}\) from the likelihood in (5.61) to obtain the following log-likelihood

$$L(\theta ) = K_{3} - \frac{1}{2}\left[ {\log |\Xi | + T\left( {\overline{\user2{y}} - {{\varvec{\upmu}}}^{*} } \right)^{\prime } \Xi^{ - 1} \left( {\overline{\user2{y}} - {{\varvec{\upmu}}}^{*} } \right)} \right]$$
(5.63)

where \(K_{3}\) is a constant, \(\user2{\overline{y}^{\prime}} = (\overline{y}_{1} ,\overline{y}_{2} , \ldots ,\overline{y}_{N} ),\) \({{\varvec{\upmu}}}^{*}\) is an \((N \times 1)\) vector with i-th element equal to

$$\mu_{i}^{*} = \frac{1}{{c_{i} }}\left[ {G\left( {\hat{\lambda };\theta } \right) - G\left( {\hat{\lambda }_{i - 1} ;\theta } \right)} \right]$$
(5.64)

and \(\Xi = DB\,\Omega_{22}^{*} B^{\prime}D\) where \(D = {\text{diag}} \left( {c_{1}^{ - 1} ,c_{2}^{ - 1} , \ldots ,c_{N}^{ - 1} } \right),\) \(\left[ B \right]_{ii} = 1,\,\,\left[ B \right]_{ij} = - 1\) for \(i = j + 1,j = 1,2, \ldots ,N - 1,\) and zero elsewhere, and \(\Omega_{22}^{*}\) is equal to \(\Omega_{22}\) defined in (5.55), but with \(\hat{x}_{i}\) and \(\hat{x}_{j}\) replaced by \(F^{ - 1} \left( {\hat{\lambda }_{i} ;\theta } \right)\) and \(F^{ - 1} \left( {\hat{\lambda }_{j} ;\theta } \right),\) respectively. The asymptotic covariance matrix for the estimator \(\hat{\theta }_{6}\) obtained by maximizing (5.61) is

$$\text{var} \,(\hat{\theta }_{6} ) = \frac{1}{T}\left[ {\partial \mu ^{{*'}} \Xi ^{{ - 1}} \frac{{\partial \mu ^{*} }}{{\partial \theta ^{'} }}} \right]$$
(5.65)

Specification of Distributions, Inequality and Poverty Measures

To implement the estimation methods described in section Estimation, a specific parametric distribution has to be specified and we need its moments, its pdf, cdf, fmdf and smdf. This information is provided in Table 5.1 for several popular income distributions. Once the parameters of a chosen distribution have been estimated, estimates for inequality and poverty incidence are frequently of interest. In Table 5.2, we provide expressions that can be used to compute inequality estimates from the estimates of the parameters. Expressions for the poverty estimates were given in section Poverty Measures, with the exception of the Watts Index whose expressions we have tabulated in Table 5.3.

Table 5.1 Probability distributions, distribution functions and moments
Table 5.2 Inequality measuresa
Table 5.3 Watts poverty indices for selected distributions

Simple Recipes for Two Distributions

In some instances, where large scale projects involving many countries and many time periods are being undertaken, it may be prudent to use estimation techniques which are relatively simple. In this section, we consider two estimation techniques that fall into this category—one for the lognormal distribution and one for the Pareto-lognormal distribution.

Lognormal Distribution

In the previous section, we indicated that the Gini coefficient for the lognormal distribution is \(g = 2\Phi \left( {{\sigma \mathord{\left/ {\vphantom {\sigma {\sqrt 2 }}} \right. \kern-\nulldelimiterspace} {\sqrt 2 }}} \right) - 1\) and its mean is \(\mu = \exp \left\{ {\beta + {{\sigma^{2} } \mathord{\left/ {\vphantom {{\sigma^{2} } 2}} \right. \kern-\nulldelimiterspace} 2}} \right\}.\) Using grouped data the Gini coefficient can be estimated from

$$\hat{g} = \sum\limits_{i = 1}^{N - 1} {\hat{\eta }_{i + 1} \hat{\lambda }_{i} } - \sum\limits_{i = 1}^{N - 1} {\hat{\eta }_{i} \hat{\lambda }_{i + 1} }$$
(5.66)

and the mean can be estimated using \(\overline{y},\)

$$\hat{\mu } = \overline{y} = \exp \left\{ {\hat{\beta } + \frac{{\hat{\sigma }^{2} }}{2}} \right\}$$
(5.67)

Utilizing these two equations and the expression for the Gini coefficient yields the parameter estimates.

$$\hat{\sigma } = \sqrt 2 \Phi^{ - 1} \left( {\frac{g + 1}{2}} \right)$$
(5.68)
$$\hat{\beta } = \ln (\overline{y}) - \frac{{\hat{\sigma }^{2} }}{2}$$
(5.69)

This approach was adopted by Chotikapanich et al. (1997).

Pareto-Lognormal Distribution

For the Pareto-lognormal distribution, we can estimate the Theil inequality measures from the grouped data, and then use these estimates, along with sample mean income to estimate the parameters. Working in this direction, the grouped-data sample estimates are

$$\hat{T}_{1} = \sum\limits_{i = 1}^{N} {c_{i} \left( {\frac{{\overline{y}_{i} }}{{\overline{y}}}} \right)\ln \left( {\frac{{\overline{y}_{i} }}{{\overline{y}}}} \right)}$$
(5.70)
$$\hat{T}_{0} = \sum\limits_{i = 1}^{N} {c_{i} \ln \left( {\frac{{\overline{y}}}{{\overline{y}_{i} }}} \right)}$$
(5.71)
$$\hat{\mu } = \overline{y}$$
(5.72)

The corresponding quantities in terms of the parameters of the Pareto-lognormal distribution are

$$T_{1} = \frac{1}{\alpha - 1} + \frac{{\sigma^{2} }}{2} + \ln \left( {\frac{\alpha - 1}{\alpha }} \right)$$
(5.73)
$$T_{0} = - \frac{1}{\alpha } + \frac{{\sigma^{2} }}{2} - \ln \left( {\frac{\alpha - 1}{\alpha }} \right)$$
(5.74)
$$\mu = \frac{\alpha }{\alpha - 1}\exp \left\{ {\beta + \frac{{\sigma^{2} }}{2}} \right\}$$
(5.75)

Assuming the mean exists \((\alpha > 1),\) from (5.70)–(5.75) we can retrieve parameter estimates using the following three steps:

  1. 1.

    Find \(\hat{\alpha }\) as the Solution to the Equation

    $$\frac{{2\hat{\alpha } - 1}}{{\hat{\alpha }(\hat{\alpha } - 1)}} + 2\ln \left( {\frac{{\hat{\alpha } - 1}}{{\hat{\alpha }}}} \right) = \hat{T}_{1} - \hat{T}_{0}$$
    (5.76)
  2. 2.

    Find \(\hat{\sigma }^{2}\) from

    $$\hat{\sigma }^{2} = \hat{T}_{1} + \hat{T}_{0} - \frac{1}{{\hat{\alpha }(\hat{\alpha } - 1)}}$$
    (5.77)
  3. 3.

    Find \(\hat{\beta }\) from

    $$\hat{\beta } = \ln (\overline{y}) + \ln \left( {\frac{{\hat{\alpha } - 1}}{{\hat{\alpha }}}} \right) - \frac{{\hat{\sigma }^{2} }}{2}$$
    (5.78)

Concluding Remarks

Inequality and poverty, both nationally and globally, continue to be two of the most pressing issues facing today’s society. Accurate measurement of inequality and poverty involves a multitude of non-trivial considerations including reliable data collection, specification of purchasing power parities and definition of a suitable poverty line. We have focused on a further consideration, how to model and estimate income distributions, and how to estimate inequality and poverty from the parameters of those income distributions, when using grouped data. Single observations are becoming increasingly available and their use is preferred to the use of grouped data if resources are adequate for doing so. However, countries and time periods for which only grouped data are available are still prevalent, and it can be advantageous to use grouped data for large scale regional and global projects. Our objective has been to summarize available techniques in a convenient form for researchers working along these lines.