Abstract
It is largely accepted among geographers and economists that the City Size Distribution (CSD) is well described by a power law, i.e., Zipf’s law. This opinion is shared by this community in a manner it could be treated as a paradigm. In reality, however, Zipf’s law is not always observed (even as an approximation), and we prefer to adopt a classification of the CSD into three classes. In this work, we present the characteristics of these classes and give some examples for them. We use the Israeli system of cities as an interesting case study in which the same ensemble of cities passes from one class to another. We relate this change to the urbanization process that occurred in Israel from the 1960s onwards.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The problem of the city size distribution (CSD) is a part of a more general problem: the size distribution of entities for which a “size” can be defined. There are many examples of such entities in various scientific fields such as cities (population), countries (area and population), incomes (amount of income), animal species (number of animals of a particular species), languages (number of people speaking a particular language), words (frequency of a given word in a particular language), mayonnaise (weight of the mayonnaise drops), terrorism (amount of damage of a particular terrorism action), etc. For a detailed bibliography we refer the reader to the Internet site: www.nlsij-genetics.org/wli/zipf). There are several equivalent mathematical possibilities to describe a size distribution: the cumulative distribution function, the density or probability function, and the rank–size distribution. In the case of cities, the sizes are represented by the populations of the cities.
The first representation, which is well known from the work of Zipf, is to sort the cities in a decreasing order, following their sizes. This yields the function S(R), which is commonly plotted on a double logarithmic scale, i.e., Ln S versus Ln R (the Zipf plot). For a given value of S, R can be seen as the number of cities with size larger than or equal to a chosen value S. This means that the function S-1(R) = R(S) is the cumulative distribution function.
In order to find how many cities there are with sizes between a given S and S + dS (dS ≪ S), one can use (−dR/dS) dS. The function D(S) = −dR/dS is the density distribution function. Sometimes the curve R(S) is discontinuous, such that D(S) cannot always be defined, and it is better to plot the histogram of Ln S. If the function S(R) is a power law, S = S0/Rm, the cumulative function and the density function are also power laws: R ∝ 1/S1/m and D ∝ 1/S1 + 1/m. These two last functions diverge when S → 0. However, this is only a pseudo divergence since S does not go to zero but to a minimum size Smini. This last case is often called a Pareto distribution, but the words “Zipf’s law” or “power law” are completely equivalent.
For the particular case of systems of cities, it was proposed that the distribution is a power law (Zipf 1941, 1949) and consequently the log–log plots of the three characteristic functions are straight lines: this is Zipf’s law.Footnote 1 Even if the law is not always observed, it was admitted that it is a good approximation (Rosen and Resnick 1980).
The hidden reason for accepting this law is probably the concept of universality. One of the inherent characteristics of a law is its universality, which has a strong attractive power. Perhaps, scholars have been fascinated by the idea that all the systems of cites over the entire world obey this very simple law (Soo 2005; Alperovich 1984, 1988). Words like “mystery” (Krugman 1996), “intellectual challenge” (Popescu 2003) and “puzzling regularity” (Gabaix 1999) were used to express the admiration for this phenomenon. The mystery increases for those who accept that the exponent of the power law is always equal to −1. Not only an integer, but the exact value: one!
In the case of City Size Distribution, this can be seen as an example of a paradigm, defined by Kuhn (1977) as: “…what members of a scientific community, and they alone, share.” A paradigm is usually considered as a group of laws that are shared by a scientific community. However, in this work we use the term paradigm in an unorthodox way when referring to a single law. We allow ourselves to do so as this unique law is found in many disciplines and is associated with similar phenomena (e.g., the existence of order).
This paradigm, accepted by the community of geographers and economists, claims that in all countries in the world, the relationship between the size and the rank is a power law (i.e., the relationship between the log of the size and the log of the rank is a straight line).
However, some researchers claimed that Zipf’s law is not always observed (Par 1973, 1976; Alperovich 1984; Laherrere and Sornette 1998; Li 2002; Benguigui and Blumenfeld-Lieberthal 2006, 2007a, b). In fact, a detailed analysis of real CSD reveals that there is not in fact a unique law that fits all existing cases. Even if the power law is considered as a good approximation, other formulas give a much better description. Universality is lost. In some cases, in order to find Zipf’s law again, a cutoff is introduced. This is, however, an ad hoc procedure as there are not always convincing justifications for it, unless maybe the desire to obtain the power law (see below).
In this work, we present a different approach, which is based, first on the notion that the paradigm should be ignored and that Zipf’s law is not necessarily always valid. Instead we propose to classify the CSD into classes of universality in analogy with critical phenomena in physics. Following the shape of the rank–size log–log plot, it is possible to define three classes of universality. They are characterized by an exponent α that can be larger than 1, equal to one or smaller than one. The case α = 1 recovers Zipf’s law. We analyze the properties of each class and give real examples for these classes.
2 The verification of Zipf’s law
The high volume of works devoted to this subject indicates that there are numerous cases where Zipf’s law is valid. As an example of Zipf’s law, we present the case of India (see the results of the analysis in “Appendix 1”). Nevertheless, beside the clearcut cases, there are also other situations that worth discussing. The method of verifying the validity of Zipf’s law is relatively simple. It is based on fitting the log–log rank–size curve of a given CSD to the expression:
where m is the Zipf exponent. A good fit of Eq. 1 to the CSD is obtained when Zipf’s law is valid. In fact, there are two completely different approaches concerning the significance and the verification of Zipf’s law. We describe the first as the “physicist approach”. This, however, does mean that all the researchers using this approach are necessarily physicistsFootnote 2; we describe the second approach as the “economist approach,” which also doesn’t indicate that all the researchers are necessarily economists.
In the first approach the verification is made separately for each country by fitting the real data of the CSD to Eq. 1. One can choose statistical criteria (e.g., the coefficient of determination, R 2) to decide whether the law is verified. These criteria, as will be shown next (see further discussion on statistical methods of validating the fit in “Appendix 1”), it is not sufficient. At this stage, we would like to make two remarks. First, in order to be able to decide if Eq. 1 is verified or not, one has to fit the data to several functions and compare the results, using the same criterion. Naturally, it is not realistic to expect each CSD would be fitted to numerous formulas; thus, we propose to use a visual inspection in order to help decide which formulas might represent the data correctly. This brings us to the second remark: we trust the human mind and believe that a visual inspection can indeed give essential information; particularly it helps deciding if the studied system is homogeneous or not. Often, a low value of the R 2 can indicate an inhomogeneous system of cities. Yet, a high value of R 2 does not always indicate a homogenous system of cities. A striking example for that argument is the case of Venezuela.Footnote 3 When fitting the CSD of Venezuela with Eq. 1, the result seems to be an excellent fit: m = 0.91 with R 2 = 0.997. However, a simple visual inspection (Fig. 1) shows that the system of cities in Venezuela is not homogeneous. It can be divided into two subsystems. This example emphasizes the need for a visual inspection of the rank–size relation of the real data on log–log scales. This gives the possibility to see (in the simple meaning of the word, see with the eye) if the points may be fitted with some mathematical function (not necessarily a straight line). As mentioned above, even if the fit with Eq. 1 is not good, it is largely accepted that Zipf’s law is a good and useful approximation (Rosen and Resnick 1980). This raises a question (which could be considered as naïve): why settle with an approximation when a better description can be achieved by another function? In “Appendix 1” we present the case of the metropolitan areas in the USA and show that a non-linear function yields a much better fit than a straight line.
Before examining the second approach, we mention the work of Alperovich (1988, 1995), who uses particular methods to check the validity of Zipf’s law. In the second approach, Zipf’s law is considered a statistical law, and the cities' populations as random variables. Consequently, each country or system of cities is a realization of the statistics. This problem is well known in Statistics, and we describe it next. We start with a given law for a random variable and try to determine the coefficient of this law from a sample with a finite number of values for the variable. In the present case, the null hypothesis can be expressed as follows: given that the cities of a particular country obey the Pareto distribution, should the value 1 (for the exponent m) be accepted or rejected (Soo 2005; Urzua 2000; Gabaix et al. 2003; Gabaix and Ioannides 2004; Nishiyama et al. 2008)? Zipf’s exponent is determined for each country, and with the help of a statistics test (like the t-test and the t-distribution), it is decided if Zipf’s law with m = 1 is acceptable or not. The conclusion of these works is that in the majority of countries, the hypothesis is accepted despite the fact that the measured exponent m is different from unity. This approach is highly problematic as it violently contradicts the intuition. It seems difficult to accept that the law is not rejected when m is considerably different from one or when the coefficient R 2 giving the quality of the fit is small (say smaller than 0.97). However, the most questionable aspect of these works is the fact that there is no serious discussion of what to do with the cases of rejection: is the exponent m different from one or is Zipf’s law not obeyed? Finally, sentences such as: “the results we obtain depend …whether we believe that Zipf’s law holds” (Soo 2005) are also difficult to accept.
To summarize this section, we mention three important problems that are yet to receive a satisfactory answer. The first is the problem of the cutoff. It was often mentioned, and several solutions were proposed (Par 1985), e.g., to cut off at a fixed number of cities (say, for example, the first 50 cities), or at cities above a given size (say 100,000 or 10,000) or at cities such that their total population is a given fraction of the total population of the countries. These choices, however, are somewhat arbitrary. We propose to begin with the idea (not often formulated but implicit in several works) that Zipf’s law may be obeyed only by a homogeneous group of cities, i.e., cities that grew in the same way due to the social, economic and cultural environment. This can be verified by examining (with the eyes) the rank–size curve. If this curve presents discontinuities or breaks in the derivative, the group of cities cannot be treated as homogeneous. This enables choosing a clear cutoff. Figure 2a presents the case of the agglomerations in India (2001) as an example in which there is a break in the derivative for agglomerations with populations smaller than 200,000. Thus, the group of agglomerations larger than this value forms a homogeneous system.
The second problem is also related to the definition of a homogeneous group of cities. Most of the work on CSD is based on the assumption that cities that belong to the same country constitute a group worth studying. This assumption ignores the fact that during their history, the frontiers of different countries have occasionally changed. A system of cities in a current country might have developed in different periods under different regimes. Only a historical study permits to discover such phenomena. Recently, we showed (Benguigui and Blumenfeld-Lieberthal 2006) that the case of Romania is an example for this phenomenon. During its history Romania was divided into three regions, each was under a different regime: the Austrian Empire, the Russian Empire and the Ottoman Empire. These empires were very different; thus, it is not surprising that the rank–size curve of Romania is a broken line that no continuous function can fit (see Fig. 2b).
The third problem concerns the largest cities in the distribution. In some countries, the largest city cannot be fitted to the same homogenous distribution that the rest of the cities belong to. This phenomenon is known as “primate city” (Jefferson 1939). Yet we often find cases where several of the largest cities (two and more) belong to a particular group. This phenomenon appears even in cases where Zipf’s law is obtained. The fit may be good since the quality criterion is based on the entire distribution and the largest cities are only a small fraction of the total number of cities. Here also a visual inspection is needed. In Fig. 3 we give the rank–size plot of the Netherlands cities to demonstrate this phenomenon; the fit to a straight line is very good, yet the four largest cities are not well aligned with the rest of the cities. This leads us to another question: should the study of CSD be refined, and in some cases, should the system of cities be divided into more than one group?
3 The classification
The general conclusion that one can draw from all the work done on Zipf’s law is that it is not always obeyed. The problem of the CSD should be examined more generally. When considering a homogenous group of cites, one should look for all the different possibilities to describe the CSD. For all the cases that do not obey Zipf’s law, other laws should be examined. Is the case of Zipf’s law a particular case? We propose a general answer to this question: the CSD (and also other distributions) can be classified into three classes following the shape of the log–log rank–size plot (Benguigui and Blumenfeld-Lieberthal 2006, 2007a, b). At this stage we cannot provide a complete theoretical explanation that links the different classes to different urban processes, and further work is needed in order to clarify these relations. A partial explanation, however, is provided in the next section.
The first class is characterized by a linear relationship between x = Ln Rank and y = Ln Size. As is well known, it is the case of the Pareto distribution, and in the case of city size distributions, it is Zipf’s law.
The second class is characterized by a parabola-like x-y curve with a symmetry axis parallel to the y axis (see example in Fig. 9).
The third class is characterized also by a parabola-like x-y curve, but with a symmetry axis parallel to the x axis (see below the case of Israel for example).
The x-y curve for the first two classes can be written as (x = Ln Rank, y = Ln Size)
with α = 1 (and b = 0) for the first class and α > 1 for the second class.
In the case of the third class the x-y curve can be written as:
with α < 1.
We call the exponent α, the shape exponent since it determines the shape of the log–log rank–size curve. The first two classes are well known, and several mathematical expressions were already proposed for them. The most popular expression for the function y(x) is a polynomial of the second order or third order (Rosen and Resnick 1980). From a practical point of view, our expression is completely equivalent to a polynomial. However, the introduction of the shape exponent enables the classification.
4 The meaning of the classification
In this section we want to address the important question: what are the characteristics of the three different classes?
4.1 Class 1 (α = 1)
The relation between S and R is a power law, S = S0/Rm, where m is the Zipf exponent. The cumulative distribution and the density distribution are also power laws indicating that the number of small cities increased when their size is decreased, as a pseudo divergence (see the relations between the rank–size distribution, the cumulative distribution and the density distribution, presented earlier).
4.2 Class 2 (α > 1)
The cumulative distribution and the density distribution are analogous to those of the class 1, i.e., both curves indicate a divergence toward the small sizes. However, for the large cities the rank–size curve has a parabolic aspect, and it is not a power law.
4.3 Class 3 (α < 1)
The most important property of this class is the fact that the density distribution has a finite value (or even has a maximum) for the smallest sizes. Two well known distributions belonging to this class are the lognormal distribution and the exponential one. The distribution of the largest cities can be approximated by a power law.
In a recent work (Benguigui and Blumenfeld-Lieberthal 2007a, b), we proposed a model for the development of a system of cities characterized by two processes. The first is a random increase of the population of each city, and the second is an increase of the number of cities. We showed that the type of the resulted CSD (i.e., the value of the exponent α) depends on the rate of the creation of new cities. In particular, a very rapid increase of the number of cities yielded α = 1 when small rates induced α to be either greater or smaller than unity. As we explain below, in the case of Israel a linear rate of the creation of new cities resulted in a decrease of the value of the exponent α.
5 An example: the temporal change of α: the case of Israel
We present a real example to describe the meaning of the above classification, particularly when the classification of a system changes with time. We chose the case of the Israeli system of cities as it provides a good representation for this phenomenon. Figure 4 presents the rank–size distribution of the Israeli system of cities for cities with populations larger than 1,000 inhabitantsFootnote 4 for the years 1961 and 2005. We fitted the rank–size curves with either Eqs. 2 or 3, which yielded an interesting result. The exponent α is larger than the one in 1961 (meaning the fit is better with Eq. 2) as seen in Fig. (4), and decreases to a value smaller than one in the subsequent years (the fit is better with Eq. 3) as seen in Figs. 4 and 5. As mentioned earlier, the rate of the creation of new cities in the model was linear and the number of cities increased as: \( N\left( t \right)\sim \,\,t^{2} \), which eventually resulted in a decrease in the values of α.
Following the above, this means that the density function diverges toward the small sizes in 1961, which is not the case for the later years. We show this result in Figs. 6 and 7. In Fig. 6 we calculated the density distribution for the small cities (with populations between 1,000 and 10,000) using the parameters appearing in Eqs. 2 or 3 deduced from the fit. One sees clearly the different behaviors, the pseudo divergence in 1961 and the absence of divergence in 2005. Figure 7 shows the histogram of Ln S. Here also, it is possible to observe the different behaviors of the small cites. In 1961 the histogram has large values consistent with a divergence when in 2005 the histogram presents smaller values on the side of the larger cities without divergence.Footnote 5 To emphasize this evolution, we show in Fig. 8 the proportion of cities smaller than 5,000 and 10,000 inhabitants through time. The fact that this proportion decreases with time indicates a process of urbanization in that the proportion of small cites becomes smaller and smaller. This is our interpretation of the change of the exponent α from a value that is larger than one to one smaller than one.
6 Conclusion
As in the case of critical phenomena in physics, we propose to define universality classes in the case of City Size Distribution. Three classes are characterized by a shape exponent α, which can be equal to one (which recovers Zipf’s law), larger than one or smaller than one. The main differences are as follows: On the side of the small cities the density distribution of class 3 (α < 1) does not diverge; it either goes to a finite value for S → S min (the smallest city) or exhibits a maximum. For α ≥ 1 (classes 1 and 2) the density distribution diverges. On the side of the large cities, for α ≤ 1 (classes 1 and 3) the distribution is a power law, which is not the case for α > 1. We presented a particular example of the Israeli system of cities, for which the shape exponent goes to a value larger than one in 1961 and to values smaller than one in the later years. We interpreted this change by a process of urbanization with a relative decrease in the number of small cities in Israel.
We believe that by abandoning the paradigm of Zipf’s law (but not Zipf's law itself!), one can gain new insights on City Size Distribution. Despite the fact that it is not easy to abandon a paradigm, we encourage researchers to look for the advantages of the new presented classification of CSD. Two examples for these advantaged are the possibility to compare different distributions belonging to the same class and to compare the evolution of a distribution through the change in the exponent α.
Notes
Sometimes a difference is made between the "rank–size rule" for which the value of the exponent of the rank–size graph is −1 and the other cases where the exponent is different from −1. These last situations are called "Zipf's law". Here we do not make this distinction. We refer to all the cases of power law as "Zipf's law".
We use the term because the best example of the method is given in the article by Laherrere and Sornette, published in a physics journal.
All the data in this work are taken from: www.citypopulation.de.
We chose the cutoff of 1,000 inhabitants because the complete rank–size curve can be divided in two parts separated by a sudden change in the slope. Following the suggestion developed above, we divided the cities into two distinct groups.
We think that a meaningful study of a distribution must be done using the three representations of a distribution (see “Appendix 1”).
References
Alperovich G (1984) The size distribution of cities: on the empirical validity of the rank–size rule. J Urban Econ 16(2):232–249
Alperovich G (1988) A new testing procedure of the rank size distribution. J Urban Econ 23(2):251–259
Alperovich G, Deutsch J (1995) The size distribution of urban areas: testing for the appropriateness of the Pareto distribution using a generalized box-cox transformation function. J Reg Sci 35(2):267–278
Benguigui L, Blumenfeld-Lieberthal E (2006) From lognormal distribution to power law: a new classification of the size distributions. Int J Mod Phys C 17(10):1429–1436
Benguigui L, Blumenfeld-Lieberthal E (2007a) A new classification of city size distributions. Comput Environ Urban Syst 31(6):648–666
Benguigui L, Blumenfeld-Lieberthal E (2007b) A dynamic model for the city size distribution beyond zipf’s law. Physica A 384(2):613–627
Gabaix X (1999) Zipf’s law for cities: an explanation. Quart J Econ 114(3):739–767
Gabaix X, Ioannides YM (2004) The evolution of city size distributions. In: Henderson JV, Thisse JF (eds) Handbook of urban and regional economics 4. Amsterdam Elsevier Science, Amsterdam, pp 2341–2378
Gabaix X, Gopikrishnan P, Plerou V, Stanley HE (2003) A theory of power law distributions in financial market fluctuations. Nature 423:267–270
Jefferson M (1939) The law of primate city. Geogr Rev 29:226–232
Krugman P (1996) Self organizing economy. Blackwell Publishers, Oxford
Kuhn TS (1977) The essential tension. Selected studies in scientific tradition and change. University of Chicago Press, Chicago
Laherrere J and Sornette D (1998) Stretched exponential distributions in nature and economy: “fat tails” with characteristic scales. Eur J Phys B2 2(4):525–539
Li W (2002) Zipf’s law everywhere. Glottometrics 5:14–21 (despite the title, the author considers situations in which the law is not valid)
Nishiyama Y, Osada S, Sato Y (2008) OLS estimation and the t test revisited in rank–size rule regression. J Reg Sci 48(4):691–715
Par JB (1973) Settlement populations and the lognormal distribution. Urban Stud 10(3):336–352
Par JB (1976) A class of deviations from rank–size regularity: three interpretations. Reg Stud 10(3):285–292
Par JB (1985) A note on the size distribution of cities over time. J Urban Econ 18(2):199–212
Popescu II (2003) On a Zipf’s law extension to impact factors. Glottometrics 6:83–93
Rosen KT, Resnick M (1980) The size distribution of cities: an examination of the Pareto law and primacy. J Urban Econ 8(2):165–186
Soo KT (2005) Zipf’s law for cities: a cross country investigation. Reg Sci Urban Econ 35(3):239–263
Urzua CM (2000) A simple and efficient test for Zipf’s law. Econ Lett 66(3):257–260
Zipf GK (1941) National unity and disunity. The Principia Press, Bloomington
Zipf GK (1949) Human behavior and the principle of least effort. Addison-Wesley, Inv, Cambridge
Acknowledgments
The authors thank George Kun of the Central Bureau of Statistics of Israel for providing the data of the cities of Israel and useful discussions.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
To illustrate the discussion on the verification of Zipf’s law, we present two different cases: the classical case of Zipf’s law, a case where the CSD is better fitted with another function rather than the straight line.
The first case is given in Fig. 2 which presents the straight line of the fit after excluding the agglomeration with population smaller than 200,000 from the distribution. The Zipf exponent m is equal to 0.918 and R 2 = 0.995. The quality of the fit is examined through the statistics of the errors. It is defined as the differences between the real data and the values calculated from the fit function. For a good fit we expect that:
-
1.
The mean of the errors will be relatively small to the total range of the errors.
-
2.
The errors will be randomly distributed on the both side of the origin.
-
3.
The error distribution will be close to a normal one with a single and clear maximum.
In the case of India, the mean of the errors is −0.0023; when the range is 0.8, the distribution of the errors is not an exact normal distribution but a good approximation of it. Thus, it can be argued that Zipf’s law is verified but with an exponent different from unity.
The second case is the metropolitan area of the USA. Its rank–size plot is presented in Fig. 9. A fit with a straight line gives a large value of R 2 = 0.9981 with m = 1.06. The statistics of the errors give the following results: mean = 2 × 10−3, standard deviation = 0.145 and range = 1.37, and the distribution is far from being a normal one as it has two maxima. The graph of the errors as a function of the log of the rank is very close to a parabola; this means that errors are not equally placed on both sides of the origin.
A fit of the data to a parabola or to Eq. 2 yielded almost the same results; the parabola is described by the parameters: y = 16.565−0.326 x−0.919 x 2, and Eq. 2 is described by: y = 16.67−0.146 (1.02 + x)1.85. The coefficient of determination, R 2 = 0.9984, is a little bit higher than in the precedent fit, but it has no significance. The statistics of the errors, however, give very different results: mean = 0.001, standard deviation = 0.0403 and range = 0.490, and the distribution is close to a normal one with a single maximum near zero. Even more interesting is the graph of the errors as a function of the log of the rank: the errors are equally distributed on both sides of the origin. We conclude that the fit with a parabola or with Eq. 2 is better than a straight line despite the very close values of R 2. In Fig. 9 we present the fit of the data with Eq. 2, and in the insert we present the errors. We admit that with a visual inspection only, it was also possible to reject Zipf’s law in this case.
Rights and permissions
About this article
Cite this article
Benguigui, L., Blumenfeld-Lieberthal, E. The end of a paradigm: is Zipf’s law universal?. J Geogr Syst 13, 87–100 (2011). https://doi.org/10.1007/s10109-010-0132-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109-010-0132-6