JEL Classifications

Using certain data on personal income V. Pareto (1897) plotted income on the abscissa and the number of people who received more than that on the ordinate of logarithmic paper and found a roughly linear relation. This Pareto distribution or Pareto law may be written as

$$ x={ay}^{-\alpha}\, \mathrm{or}\, \log \, x={a}^{\prime }-\alpha \, \log \, y $$
(1)

where α (the negative slope of the straight line) is called the Pareto coefficient. The density of the distribution is

$$ d\mathrm{x}= a\alpha {y}^{-\alpha +1}d\mathrm{y} $$

The Pareto coefficient is occasionally used as a measure of inequality: The larger α the less unequal is the distribution. According to Champernowne (1952), α is useful as a measure of inequality for the high income range whereas for medium and low incomes other measures are preferable.

α takes only positive values. If α < 2 the distribution has no variance; if α < 1 it has no mean either. In practice the Pareto law applies only to the tail of the empirical distributions i.e. to incomes above a certain size. Thus the law (1) is valid asymptotically as y → ∞. The range in which the empirical distributions conform to the law is different in different cases. It seems to be larger for wealth than for income (perhaps because we have data only for large wealth) and even larger for towns. In the case of firm sizes only very large firms are covered by the law.

In the case of the distribution of towns by size of population the rank-size relation has been used (Zipf 1949) which is the same as the Pareto distribution except that it uses rank as a measure of the tail (instead of the number of towns above a certain size) so that the higher the rank (beginning with rank one for the largest town) the smaller the size of the town. Zipf believed (incorrectly) that the coefficient α is always about one so that the product of rank and size is constant. But Pareto, of course, was even more ‘out’ with his belief that the Pareto coefficient for income α always equals unity. In highly industrialized countries today it is above 2 and sometimes above 3.

The main interest of the Pareto distribution lies not in its rather limited use as a measure of inequality but in the explanations it has provoked, naturally so since regular patterns are felt to be a challenge to the mind. There are two types of approach to the problem. That of Champernowne, Yule and Simon explains the characteristic pattern as the steady state of a stochastic process which has been evolving in time, so that the pattern reflects something which has been going on in the past. In contrast, Mandelbrot has been looking for a ‘synchronic’ explanation which does not depend on a process in time. He is mainly concerned with the reproductive quality of the Pareto distribution: If a large number of independent random variables is identically distributed according to Pareto’s law then the sum of these random variables will also be distributed according to this law. Thus it could be expected that the income of the various counties in England would be Pareto distributed because it results in each case from the addition of individual incomes which are Pareto distributed.

Champernowne’s pioneering work (1953) in essence goes back to his fellowship dissertation of 1936, published in 1973. He builds on a tradition which explains the normal distribution as the result of the addition of random unit steps (left or right) on the line over a long time (random walk; for the terms and concepts relating to random processes, see Feller, Vol. I). If the random walk takes place on the logarithmic scale the distribution of the sum of steps will tend to log normality. This does not give, however, a stable distribution, because the dispersion will go on increasing all the time. Champernowne chooses the technique of the Markov chain: Each year’s income depends only on the previous year’s income plus a random increment proportionate to last year’s income; the probability of various increments remains constant from one year to the other. This feature is called the law of proportionate effect. Thus the required data will be embodied in a matrix which contains the probabilities of transition from one income in one year to another income in the following year. The number of income receivers remains stable in Champernowne’s model because each exit is assumed to be automatically compensated by a new entry. To guarantee that the system reaches a steady state it is assumed that on the average the change of income is downwards; this is necessary to compensate the tendency of the system to diffusion which is characteristic of the unrestrained random walk. The assumption reflects the low income of new entrants.

In fact the role of new entry is crucial not only in this model but in other applications as well (size of firms, towns, wealth).

H. Simon (1955) studied the number of times a particular word (vocable) occurs in a text. The number of vocables which occur with a given frequency decreases with that frequency in a Pareto-like fashion. Simon’s treatment is based on the work of Yule (1924), who dealt with a biological problem: the frequency of genera with different number of species which is distributed according to Pareto. He explained this pattern by means of a pure birth process deriving from this the Yule distribution with density

$$ f(n)=\alpha \Gamma \left(1+\alpha \right){n}^{-1-\alpha}\, \mathrm{as}\, n\to \infty . $$

The model of evolution assumes that mutations occur randomly with a frequency g per time unit, creating new genera, and with a frequency s per time unit creating new species, where g < s. Since each species has the same chance of creating a new species we have here a proportionate growth, in analogy to the law of proportionate effect. The steady state is produced by the emergence of new genera. The Pareto coefficient equals the ratio of the frequencies with which the two kinds of mutations appear, that is g/s. Simon, whose merit it is to have drawn attention to this brilliant work, has suggested application to incomes (not very convincingly) and has himself applied it to firm sizes (1967). A very direct application relates to the size of towns (Steindl 1965). If the number of towns grows at the rate of μ and the number of inhabitants of the town grows at the rate of ρ then after a sufficiently long time there will be a steady state distribution with Pareto coefficient μ/ρ.

Mandelbrot (1960, 1961) deals with the problem from the point of view of a mathematician and therefore on a very general level. He starts from the concept of stable laws (compare Feller, Vol. II, ch. VI). If a sum of independent identically distributed random variables is distributed in the same way as its components, except for a scale factor and possibly of a location factor, then this distribution is stable. The best-known example is the normal distribution. It has been shown by P. Lévy that there is a class of distributions with infinite variance which are stable and which converge to the law of Pareto when the variable in question (say, income) tends to infinity. The Pareto law in this context is confined to the range 1 < α < 2. Mandelbrot surmises, owing to the reproductive quality, in the above sense, of the Pareto law, that its importance empirically must be very great. He also considers that this must have implications for some statistical methods which depend on the assumption of normalcy.

As to income, Mandelbrot suggests that it can be regarded as composed of a number of independent elements which are identically distributed. We can easily imagine a decomposition into a few parts such as earned income, property income and transfer income. Mandelbrot requires, however, in order to assure convergence, a large number of components, and these, as he admits, have hardly any counterparts in reality (1961, p. 525). The explanation is analogous to the well known explanation of the stature of adult men as a random variable composed of a great number of independent small random variables; this explains the normal distribution of height. The precise identity of these small random variables is, here again, not specified and rather speculative. This may perhaps explain why this ‘synchronous’ approach has not, so far, found much resonance among economists.

The interest of the alternative approach (Champernowne or Yule) of explaining the law as a steady state of a stochastic process is that it establishes a relation between the stratification found in a cross section and the past history which has produced it, and which is mapped in the cross section. This is analogous to the stratifications in geology or the rings in the trunk of a tree. Irregularities or shifts in the empirical distributions can according to this view be explained by major disturbances of the process in certain points of time in the past.

Concretely, the Pareto distribution has been shown, in the case of a birth and death process model, to depend on growth; in an economy which has always been stationary it would not exist (Steindl 1965). The Pareto coefficient in such models is usually a ratio of growth rates; thus in the case of firm size it is a ratio of the growth rate of the number of firms to the growth rate of the firms themselves (Steindl 1965). The importance of new entry as a factor making for less inequality has also been shown, inter alia in the case of wealth (Steindl 1972).

The stochastic models have often been criticized for their lack of economic content. Perhaps it has been overlooked that they only represent the first steps in a new and exceedingly difficult terrain. It may be thought that the work of Champernowne, Yule, Simon and Wold and Whittle contains the seed of future studies which will reveal their full potentiality only when they are extended to distributions in several dimensions.

See Also